Site Reliability Engineer

1 day ago


Palo Alto, California, United States General Motors Full time
Job Description

At General Motors, we are pioneering next-generation software solutions for commercial fleet owners and their drivers. As a Site Reliability Engineer, you will play a critical role in improving the reliability, scalability, and operability of our production system.

Responsibilities:
  • Lead the Site Reliability engineering effort to improve anomaly detection, platform stability, and resilience using modern best practices.
  • Partner with engineering and customer success teams to ensure comprehensive monitoring and incident response and management processes are in place.
  • Help create a culture of accountability and ownership of excellent customer experience.
What You'll Do:
  • Implement scalable, reliable, and secure SRE and Observability platforms to monitor the health of our production system and provide a holistic view of the environment.
  • Deliver tools and software to improve the reliability, scalability, and operability of services.
  • Collaborate with engineering teams to analyze and provide inputs in architecture, infrastructure resources, and observability to achieve reliability and scalability goals.
  • Collaborate with engineering teams to conduct production readiness reviews, deployment, operation, and refinement.
  • Partner with stakeholders to ensure data and observability tools are effectively integrated with other systems and processes.
  • Partner with stakeholders to identify, measure, and monitor availability, latency, and overall service health.
  • Participate in on-call engineering duty to support production.
  • Instill Site Reliability best practices through automation, data insights, and observability.
  • Perform initial incident root cause analysis with engineers and conduct incident postmortems.
  • Build run books and tooling to carry out production support activities.
  • Actively participate in technical discussions and deep dives with the Architectural group.
Qualifications:
  • 7+ years of hands-on SRE experience with at least one public cloud provider (Azure, AWS, or GCP).
  • Experience operating high-availability, fault-tolerant, scalable, distributed software in production.
  • Experience with monitoring and log aggregation frameworks (Azure Monitor, Data Dog, Dynatrace, Elasticsearch, Kibana, Logstash).
  • Strong working knowledge of Docker, Kubernetes, Terraform, Chef, or Ansible.
  • Experience troubleshooting JVM-based applications.
  • Chaos engineering implementation and experience a big plus.
  • Strong experience in scripting/programming (Python, Java, PowerShell, Bash).
  • Experience with configuration and management of SSO, Big Data/No-SQL in cloud infrastructure.
  • CI/CD automation frameworks knowledge (Jenkins, Azure DevOps).
  • Strong understanding of public cloud networking components.
  • You have a story to tell about how you led and influenced cross-organization efforts to improve uptime to at least 99.99%.
  • Working experience with source control management tools (Bitbucket, GitHub, Azure DevOps).
  • Experience with IoT stack a big plus.
  • BS/MS in Computer Science/Engineering preferred.
How You Could Win Us Over:
  • You have a story to tell about how you led and influenced cross-organization efforts to improve uptime to at least 99.99%.
  • You have excellent Azure experience from Identity management, CI/CD pipeline, to Azure Monitor and Application Insights.
Additional Information:

This role is categorized as Hybrid, meaning the successful candidate is expected to report onsite three times per week at minimum.

The compensation information is a good faith estimate only, based on what a successful applicant in the California Bay Area might be paid in accordance with the California law.

The annual salary range for this role is $152,100 - $232,900.

The actual base salary a successful candidate will be offered within this range will vary based on factors relevant to the position.

An incentive pay program offers payouts based on company performance, job level, and individual performance.

GM offers a variety of health and wellbeing benefit programs, including medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, sickness and accident benefits, life insurance, paid vacation & holidays, tuition assistance programs, employee assistance program, GM vehicle discounts, and more.

Our vision is a world with Zero Crashes, Zero Emissions, and Zero Congestion, and we aspire to be the most inclusive company in the world.

We believe we all must make a choice every day – individually and collectively – to drive meaningful change through our words, our deeds, and our culture.

Our Work Appropriately philosophy supports our foundation of inclusion and provides employees the flexibility to work where they can have the greatest impact on achieving our goals, dependent on role needs.

Every day, we want every employee, no matter their background, ethnicity, preferences, or location, to feel they belong to one General Motors team.

The goal of the General Motors total rewards program is to support the health and well-being of you and your family.

We are committed to being a workplace that is not only free of discrimination but one that genuinely fosters inclusion and belonging.

We strongly believe that workforce diversity creates an environment in which our employees can thrive and develop better products for our customers.

GM is proud to be an equal opportunity employer.



  • Palo Alto, California, United States X (formerly Twitter) Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our Command Center Team at X (formerly Twitter). As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and reliability of our services, which are used by millions of users worldwide.Key ResponsibilitiesTriage and troubleshoot complex...


  • Palo Alto, California, United States Criteo Full time

    About the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available systems to support our growing...


  • Palo Alto, California, United States Criteo Full time

    About the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and applications.Key ResponsibilitiesDesign, develop, and maintain scalable and reliable software systemsCollaborate with...


  • Palo Alto, California, United States Criteo Full time

    About the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and applications.Key ResponsibilitiesDesign, develop, and maintain scalable and highly available systems and...


  • Palo Alto, California, United States Mistral AI Full time

    About Mistral AIMistral AI is a leading innovator in the field of open-source large language models. Our mission is to make AI ubiquitous and open, bridging the gap between technology and businesses of all sizes.Job SummaryWe are seeking a highly experienced Site Reliability Engineer to shape the reliability, scalability, and performance of our platform and...


  • Palo Alto, California, United States General Motors Full time

    About the RoleAt General Motors, we're committed to innovation and excellence in all aspects of our business. As a Staff Site Reliability Engineer, you'll play a critical role in ensuring the reliability and scalability of our software systems. You'll work closely with cross-functional teams to design, implement, and maintain high-quality software solutions...


  • Palo Alto, California, United States General Motors Full time

    Job DescriptionAt General Motors, we're revolutionizing the automotive industry with software-defined vehicles. As a Site Reliability Engineer, you'll play a critical role in ensuring the reliability, scalability, and security of our production systems.ResponsibilitiesLead the Site Reliability engineering effort to improve anomaly detection, platform...


  • Palo Alto, California, United States Rubrik Full time

    About The Role:Rubrik is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and availability of our cloud-based data security platform. You will work closely with our development team to identify and resolve issues, and collaborate with our operations team...


  • Palo Alto, California, United States X (formerly Twitter) Full time

    About XX is a global digital public square, committed to protecting freedom of speech and building the future of unlimited interactivity. Our mission is to empower every user to freely create and share ideas, fostering open public discourse without barriers.Job SummaryWe are seeking a highly motivated CDN Site Reliability Engineer to join our Edge Services...


  • Palo Alto, California, United States Rubrik Full time

    About The RoleRubrik is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and durability of our databases, as well as establishing best practices for internal teams to write performant SQL queries.Key ResponsibilitiesEnsure high availability and...


  • Palo Alto, California, United States Rubrik Full time

    About the RoleRubrik is seeking a Senior Site Reliability Engineer to join our team. As a Senior Site Reliability Engineer, you will be responsible for ensuring the high availability and durability of our databases, establishing best practices for internal teams to write performant SQL queries, and performing periodic database upgrades minimizing downtime...


  • Palo Alto, California, United States Rubrik Full time

    About the RoleRubrik is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a Senior Site Reliability Engineer, you will be responsible for ensuring the high availability and durability of our databases, establishing best practices for internal teams to write performant SQL queries, and performing periodic database upgrades with...


  • Palo Alto, California, United States Rubrik Full time

    About the RoleRubrik is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and durability of our databases, establishing best practices for internal teams to write performant SQL queries, and performing periodic database upgrades with minimal...


  • Palo Alto, California, United States X (formerly Twitter) Full time

    About XX is a global digital public square, committed to protecting freedom of speech and building the future of unlimited interactivity. Our mission is to empower every user to freely create and share ideas, fostering open public discourse without barriers.Job SummaryWe're seeking a highly motivated Senior/Staff CDN Site Reliability Engineer to join our...


  • Palo Alto, California, United States Plume Design Inc Full time

    Job Title: Technical Manager, Site Reliability EngineeringWe're seeking a seasoned Technical Manager with expertise in Customer Facing environments to lead our Site Reliability Engineering Team. This team focuses on deployments, fixes, and sustainability. The ideal candidate will have strong technical knowledge in key areas while prioritizing customer...


  • Palo Alto, California, United States Rubrik Full time

    About The RoleThe Rubrik Engineering team is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and durability of our databases, as well as establishing best practices for internal teams to write performant SQL queries.Key ResponsibilitiesEnsure high...


  • Palo Alto, California, United States Criteo Full time

    About the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining scalable and highly available systems that support our business-critical applications.You will work closely with our engineering teams to identify and resolve...


  • Palo Alto, California, United States Rubrik Full time

    About The RoleRubrik is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the smooth operation of our infrastructure services, ensuring high availability and durability of our databases, and driving reliability, availability, and efficiency improvements to our Polaris Cloud...


  • Palo Alto, California, United States Rubrik Full time

    About The Role:As a Site Reliability Engineer at Rubrik, you will play a critical role in ensuring the smooth operation of our infrastructure services. This includes maintaining high availability and durability of our databases, establishing best practices for internal teams to write performant SQL queries, and performing periodic database upgrades with...


  • Palo Alto, California, United States Tesla Full time

    About TeslaTesla is a pioneering electric vehicle and clean energy company that's revolutionizing the way we think about transportation and energy. We're a team of innovators, engineers, and problem-solvers who are passionate about making a difference.Job SummaryWe're seeking a highly skilled Staff Site Reliability Engineer to join our Fleetnet team. As a...