Site Reliability Engineer

5 days ago


Jersey City, NJ, United States Purple Drive Full time

**************LOCAL PREFERRED***********************

We are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in Apache Flink, Kubernetes, and automation. The ideal candidate will be responsible for designing, deploying, and maintaining scalable, resilient systems, while ensuring high availability and performance in production environments. This role requires a solid background in distributed systems, container orchestration, and DevOps practices.

Key Responsibilities

  • Design, implement, and maintain scalable Apache Flink deployments on Kubernetes.
  • Develop automation tools and scripts to streamline deployment, monitoring, and maintenance of Flink jobs and infrastructure.
  • Ensure high availability, scalability, and reliability of production systems.
  • Collaborate with development and infrastructure teams to optimize application performance.
  • Build and manage monitoring/alerting systems using Prometheus, Grafana, ELK stack, or similar tools.
  • Work with cloud platforms (AWS, GCP, Azure) to design and manage infrastructure.
  • Apply best practices for networking, security, and container orchestration.
  • Troubleshoot complex production issues and drive root cause analysis.
  • Contribute to CI/CD pipelines for deployment automation.
  • Participate in on-call rotations to ensure uptime and reliability.
Required Skills & Qualifications

  • Strong hands-on experience with Apache Flink in production environments.
  • Expertise in Kubernetes (Helm, Operators, CRDs).
  • Proficiency in scripting languages (Python, Bash, Go).
  • Experience with monitoring & observability tools (Prometheus, Grafana, ELK, etc.).
  • Solid understanding of cloud platforms (AWS, GCP, Azure).
  • Strong knowledge of networking, security, and container orchestration.
  • Familiarity with CI/CD pipelines and DevOps practices.
  • Excellent problem-solving, debugging, and communication skills.


  • Jersey City, NJ, United States Syntricate Technologies Full time

    Job Title : Site Reliability Engineer (AWS) (SRE)- Location : Jersey city ,NJ -( 3 days WFO, 2 days WFH) Duration : 6 +Months Position Responsibilities: Site Reliability Engineer (AWS) (SRE) Work Location: Jersey city New Jersey Only near by candidate will be considered ( 3 days WFO, 2 days WFH) 1 Zoom / tech interview and 1 onsite interview with...


  • Jersey City, NJ, United States Syntricate Technologies Full time

    Job Title : Site Reliability Engineer (AWS) (SRE)- Location : Jersey city ,NJ -( 3 days WFO, 2 days WFH) Duration : 6 +Months Position Responsibilities: Site Reliability Engineer (AWS) (SRE) Work Location: Jersey city New Jersey Only near by candidate will be considered ( 3 days WFO, 2 days WFH) 1 Zoom / tech interview and 1 onsite interview with...


  • Jersey City, NJ, United States Concord IT Systems Full time

    Job Order DetailsPosition Title* Site Reliability Engineer (AWS) (SRE) Position ResponsibilitiesSite Reliability Engineer (AWS) (SRE)Work Location: Jersey city New JerseyOnly near by candidate will be considered ( 3 days WFO, 2 days WFH) 1 Zoom / tech interview and 1 onsite interview with leadership. Looking for strong AWS experience, 2-3 years recent...


  • Jersey City, NJ, United States Concord IT Systems Full time

    Job Order DetailsPosition Title* Site Reliability Engineer (AWS) (SRE) Position ResponsibilitiesSite Reliability Engineer (AWS) (SRE)Work Location: Jersey city New JerseyOnly near by candidate will be considered ( 3 days WFO, 2 days WFH) 1 Zoom / tech interview and 1 onsite interview with leadership. Looking for strong AWS experience, 2-3 years recent...


  • Jersey City, NJ, United States Diverse Lynx Full time

    Title: Site reliability engineering-Senior Engineer Location: Jersey City, NJ/ Edison, NJ Type: FulltimeJob Description: Must have skills: Python or Java. Splunk Cloud, Thousand Eyes, cloud platforms such as AWS, Google Cloud, or Azure. Docker and Kubernetes. Responsibilities: System Reliability: Work with production support teams to implement scalable,...


  • Jersey City, NJ, United States Diverse Lynx Full time

    Title: Site reliability engineering-Senior Engineer Location: Jersey City, NJ/ Edison, NJ Type: FulltimeJob Description: Must have skills: Python or Java. Splunk Cloud, Thousand Eyes, cloud platforms such as AWS, Google Cloud, or Azure. Docker and Kubernetes. Responsibilities: System Reliability: Work with production support teams to implement scalable,...


  • Jersey City, NJ, United States Diverse Lynx Full time

    Title: Site reliability engineering-Senior Engineer Location: Jersey City, NJ/ Edison, NJ Type: FulltimeJob Description: Must have skills: Python or Java. Splunk Cloud, Thousand Eyes, cloud platforms such as AWS, Google Cloud, or Azure. Docker and Kubernetes. Responsibilities: System Reliability: Work with production support teams to implement scalable,...


  • Jersey City, NJ, United States Verisk Analytics Full time

    Job Description As a Senior Site Reliability Engineer, you'll bridge the gap between software development and operations, applying software engineering principles to infrastructure and operations problems. You'll help design, build, and maintain the systems that keep our services reliable and scalable while working closely with development teams to improve...


  • Jersey City, NJ, United States Simple Solutions Full time

    Job Description Sr. Fortinet SDWAN Engineer - JPMC/5 days a week on site Location: Plano, TX, Columbus, OH or Jersey City, NJ (Onsite - 5 Days/Week) Contract Duration: 12 Months Job Summary: Seeking a skilled Network Engineer with experience in the Fortinet suite and expertise in SD-WAN technologies. The ideal candidate will design, implement, and maintain...


  • Jersey City, NJ, United States Blt Management Llc Full time

    Lead Engineer Location: Jersey City, NJ BLT is seeking a dedicated and professional Lead Engineer to join our luxury residential community in Jersey City, NJ. The Lead Engineer will oversee all mechanical / electrical operations, installation, and maintenance of equipment, systems, and facilities, ensuring they operate safely, efficiently, and reliably. This...