Site Reliability Engineer

5 days ago


Cupertino, California, United States Juniper Networks Full time
Job Title: Site Reliability Engineer

Juniper Networks is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud infrastructure.

Key Responsibilities:
  • Maintain system availability, health, and service levels (SLAs, SLOs) of large-scale cloud infrastructure running in AWS and GCP.
  • Support infrastructure components, data streaming frameworks, and databases, such as Kubernetes, Flink, Storm, Spark, Kafka, Cassandra, Elasticsearch, Redis, Postgres, ArangoDB, and many others.
  • Monitor, troubleshoot, analyze failures, and provide support for software engineers to debug production issues across microservices and distributed platforms.
  • Work with development teams to resolve issues found.
  • Join on-call rotation and resolve issues in a 24x7 multi-cloud (AWS/GCP) environment.
  • Monitor metrics and performance of applications and cloud infrastructure.
  • Handle entire lifecycle of incident management, including reporting, analyzing, handling incidents, until its closure and writing RCAs.
  • Write and update runbooks for knowledge-driven automated processes and bots.
  • Perform capacity planning based on performance, usage, and utilization stats.
  • Follow SRE best practices and procedures.
Requirements:
  • Bachelor's degree in computer science or computer engineering or equivalent.
  • Minimum 5+ years of devops/SRE experience.
  • 3+ years hands-on experience with AWS or GCP, EC2 (GCE), IAM, S3 (GS), Docker, Kubernetes pods, Jenkins, Prometheus, CloudWatch (Stack Driver), Linux, Ansible.
  • 3+ years' experience in deploying code and infrastructure in AWS or GCP using continuous integration/continuous delivery (CI/CD) tools in production environments.
  • 5+ Administration experience of distributed computation and streaming frameworks, like Kafka, Cassandra, Elasticsearch, Flink, Storm, Spark, and cloud services EMR, Dataproc, Elasticache, AWS RDS, GCP SQL or similar.
  • 5+ years of automation using Python or/and Golang, or/and Rust, and shell scripting.
  • 5+ prior experience in developing metrics to monitor health of infrastructure and applications.
  • Good understanding of Terraform or CloudFormation or any IaC code is preferred.
Preferred Skills:
  • Any opensource development experience.
  • AI Ops /Gen AI experience.
  • Automation using workflow services GitHub Actions, Google Workflows, Jenkins, GitLab, Slack and Confluence/Jira.
  • Microservices release operations experience.
Compensation:

The pay range for this position is expected to be between $140,800.00 and $202,400.00/year; however, the base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience.

The total compensation package for this position also includes medical benefits, 401(k) eligibility, vacation, sick time, and parental leave. Additional details of participation in these benefit plans will be provided if an employee receives an offer of employment.

If hired, employee will be in an 'at-will position' and the Company reserves the right to modify base salary (as well as any other payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.

Juniper's pay range data is provided in accordance with local state pay transparency regulations. Juniper may post different minimum wage ranges for permanent residency petitions pursuant to US Department of Labor requirements.



  • Cupertino, California, United States Apple Full time

    Job Title: Site Reliability EngineerAt Apple, we're looking for a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining our core infrastructure. This infrastructure enables thousands of Apple Developers to submit their Apps to the App Store that delight millions of...


  • Cupertino, California, United States Apple Full time

    Job Title: Site Reliability EngineerAt Apple, we're looking for a highly skilled Site Reliability Engineer to join our Cloud Service Infrastructure team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and security of our cloud services.Key Responsibilities:Operate, monitor, and prioritize our production and...


  • Cupertino, California, United States Apple Full time

    Job DescriptionApple is seeking an innovative Site Reliability Engineer to join our Apple Services Engineering team. As a Site Reliability Engineer, you will play a vital role in designing, building, and maintaining our core infrastructure, enabling thousands of Apple Developers to submit their Apps to the App Store that delight millions of Apple...


  • Cupertino, California, United States Apple Full time

    Job DescriptionApple is seeking an experienced Site Reliability Engineer to join our Apple Services Engineering team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining our core infrastructure, which enables thousands of Apple Developers to submit their Apps to the App Store that delight millions of Apple...


  • Cupertino, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Apple Services Engineering (ASE) team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based services.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud...


  • Cupertino, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Apple Services Engineering (ASE) team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our global services.Key ResponsibilitiesLead data-driven roadmap and quarterly planning for a subset of core services from a...


  • Cupertino, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Apple Services Engineering team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based services.Key ResponsibilitiesLead data-driven roadmap and quarterly planning for a subset of core services from a...


  • Cupertino, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Apple Maps Infrastructure team. As a key member of our team, you will be responsible for designing, building, and maintaining scalable and reliable infrastructure services that support our business operations.Key ResponsibilitiesCollaborate with engineering, security, and SRE...


  • Cupertino, California, United States Juniper Networks Full time

    Job Title: Site Reliability EngineerJuniper Networks is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud infrastructure.Key Responsibilities:Maintain system availability, health, and service levels (SLAs, SLOs) of large-scale...


  • Cupertino, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Apple Maps Infrastructure team. As a key member of our team, you will be responsible for designing, building, and maintaining scalable and reliable cloud infrastructure services.Key ResponsibilitiesDesign and implement scalable and reliable cloud infrastructure...


  • Cupertino, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Apple Services Engineering (ASE) team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our global services.Key ResponsibilitiesLead data-driven roadmap and quarterly planning for a subset of core services from a...


  • Cupertino, California, United States Bayside Solutions Full time

    Job Title: Site Reliability Engineer, Virtualization and PlanningWe are seeking a highly skilled Site Reliability Engineer to join our team at Bayside Solutions, Inc. This role will focus on supporting Infrastructure as a Service (IaaS) virtualization platforms, Linux compute environments, and capacity planning.Key Responsibilities:Design and implement...


  • Cupertino, California, United States Bayside Solutions Full time

    Job Title: Site Reliability Engineer, Virtualization and PlanningWe are seeking a highly skilled Site Reliability Engineer to join our team at Bayside Solutions, Inc. This role will focus on supporting Infrastructure as a Service (IaaS) virtualization platforms, Linux compute environments, and capacity planning.Key Responsibilities:Design and implement...


  • Cupertino, California, United States Apple Full time

    Role SummaryAs a Site Reliability Engineer at Apple, you will play a critical role in ensuring the reliability and scalability of our cloud services. You will be responsible for designing, building, and implementing innovative solutions to improve the stability, security, and scalability of our cloud systems.Key ResponsibilitiesOperate, monitor, and...


  • Cupertino, California, United States Apple Full time

    Job Title: Site Reliability Engineering ManagerAt Apple, we're looking for a talented Site Reliability Engineering Manager to join our team. As a Site Reliability Engineering Manager, you will be responsible for designing, developing, and operating Fleet Management Services, including core infrastructure to provide fast, secure, and reliable data center...


  • Cupertino, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Apple. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based services.Key ResponsibilitiesLead data-driven roadmap and quarterly planning for a subset of core services from a reliability...


  • Cupertino, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Apple. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based services.Key ResponsibilitiesLead data-driven roadmap and quarterly planning for a subset of core services from a reliability...


  • Cupertino, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Apple Service Engineering - Solr SRE team. As a key member of our team, you will be responsible for developing processes, tools, and automation for managing distributed systems in production environments.Key ResponsibilitiesDesign and implement scalable search infrastructure...


  • Cupertino, California, United States Juniper Networks Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Juniper Networks. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure.Key ResponsibilitiesMaintain system availability, health, and service levels (SLAs, SLOs) of...


  • Cupertino, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Apple Service Engineering team. As a key member of our Solr SRE team, you will be responsible for developing processes, tools, and automation for managing distributed systems in production environments.Key ResponsibilitiesDesign and implement scalable search infrastructure...