Current jobs related to Site Reliability Engineer - Palo Alto, California - General Motors
-
Site Reliability Engineer
2 weeks ago
Palo Alto, California, United States X (formerly Twitter) Full timeAbout the RoleWe're seeking a highly skilled Site Reliability Engineer to join our Command Center Team at X (formerly Twitter). As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and reliability of our services, which are used by millions of users worldwide.Key ResponsibilitiesTriage and troubleshoot complex...
-
Site Reliability Engineer
2 weeks ago
Palo Alto, California, United States Criteo Full timeAbout the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available systems to support our growing...
-
Site Reliability Engineer
1 month ago
Palo Alto, California, United States Criteo Full timeAbout the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and applications.Key ResponsibilitiesDesign, develop, and maintain scalable and reliable software systemsCollaborate with...
-
Site Reliability Engineer
1 month ago
Palo Alto, California, United States Criteo Full timeAbout the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and applications.Key ResponsibilitiesDesign, develop, and maintain scalable and highly available systems and...
-
Site Reliability Engineer
2 weeks ago
Palo Alto, California, United States Mistral AI Full timeAbout Mistral AIMistral AI is a leading innovator in the field of open-source large language models. Our mission is to make AI ubiquitous and open, bridging the gap between technology and businesses of all sizes.Job SummaryWe are seeking a highly experienced Site Reliability Engineer to shape the reliability, scalability, and performance of our platform and...
-
Staff Site Reliability Engineer
1 month ago
Palo Alto, California, United States General Motors Full timeAbout the RoleAt General Motors, we're committed to innovation and excellence in all aspects of our business. As a Staff Site Reliability Engineer, you'll play a critical role in ensuring the reliability and scalability of our software systems. You'll work closely with cross-functional teams to design, implement, and maintain high-quality software solutions...
-
Site Reliability Engineer
2 weeks ago
Palo Alto, California, United States General Motors Full timeJob DescriptionAt General Motors, we are pioneering next-generation software solutions for commercial fleet owners and their drivers. As a Site Reliability Engineer, you will play a critical role in improving the reliability, scalability, and operability of our production system.Responsibilities:Lead the Site Reliability engineering effort to improve anomaly...
-
Site Reliability Engineer
1 month ago
Palo Alto, California, United States Rubrik Full timeAbout The Role:Rubrik is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and availability of our cloud-based data security platform. You will work closely with our development team to identify and resolve issues, and collaborate with our operations team...
-
Site Reliability Engineer
2 weeks ago
Palo Alto, California, United States Rubrik Full timeAbout the RoleRubrik is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and durability of our databases, establishing best practices for internal teams to write performant SQL queries, and performing periodic database upgrades with minimal...
-
Site Reliability Engineer
6 days ago
Palo Alto, California, United States Rubrik Full timeAbout The Role:As a Site Reliability Engineer at Rubrik, you will play a critical role in ensuring the smooth operation of our infrastructure services. You will work closely with product managers, designers, and other engineers to define the next generation of products for Rubrik.Key Responsibilities:Ensure high availability and durability of our...
-
Senior CDN Site Reliability Engineer
4 weeks ago
Palo Alto, California, United States X (formerly Twitter) Full timeAbout XX is a global digital public square, committed to protecting freedom of speech and building the future of unlimited interactivity. Our mission is to empower every user to freely create and share ideas, fostering open public discourse without barriers.Job SummaryWe're seeking a highly motivated Senior/Staff CDN Site Reliability Engineer to join our...
-
Site Reliability Engineering Team Lead
2 weeks ago
Palo Alto, California, United States Plume Design Inc Full timeJob Title: Technical Manager, Site Reliability EngineeringWe're seeking a seasoned Technical Manager with expertise in Customer Facing environments to lead our Site Reliability Engineering Team. This team focuses on deployments, fixes, and sustainability. The ideal candidate will have strong technical knowledge in key areas while prioritizing customer...
-
Technical Manager
6 days ago
Palo Alto, California, United States Plume Full timeJob OverviewAt Plume, we're seeking a seasoned Technical Manager to lead our Site Reliability Engineering Team. This team is responsible for ensuring the smooth operation of our cloud infrastructure, deploying new features, and resolving production issues.The ideal candidate will have a strong technical background, experience managing teams, and excellent...
-
Site Reliability Engineer
2 weeks ago
Palo Alto, California, United States Criteo Full timeAbout the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining scalable and highly available systems that support our business-critical applications.You will work closely with our engineering teams to identify and resolve...
-
Site Reliability Engineer
4 weeks ago
Palo Alto, California, United States Rubrik Full timeAbout The Role:As a Site Reliability Engineer at Rubrik, you will play a critical role in ensuring the smooth operation of our infrastructure services. This includes maintaining high availability and durability of our databases, establishing best practices for internal teams to write performant SQL queries, and performing periodic database upgrades with...
-
Staff Site Reliability Engineer, Fleetnet
3 weeks ago
Palo Alto, California, United States Tesla Full timeAbout TeslaTesla is a pioneering electric vehicle and clean energy company that's revolutionizing the way we think about transportation and energy. We're a team of innovators, engineers, and problem-solvers who are passionate about making a difference.Job SummaryWe're seeking a highly skilled Staff Site Reliability Engineer to join our Fleetnet team. As a...
-
Site Reliability Engineer, AI Infrastructure
4 weeks ago
Palo Alto, California, United States Tesla Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our AI Infrastructure team at Tesla. As a Site Reliability Engineer, you will be responsible for maintaining and improving our platform to ensure our Full-Self-Driving (FSD), Tesla Bot & Dojo engineering teams have the necessary tools and resources to be productive.Key...
-
Staff Site Reliability Engineer, PLM Operations
2 weeks ago
Palo Alto, California, United States Tesla Full timeJob SummaryWe are seeking a highly skilled Site Reliability Engineer to join our PLM Operations team at Tesla. As a key member of our team, you will be responsible for ensuring the reliability and performance of our PLM systems, which are critical to the success of our engineering design tools.As a Site Reliability Engineer, you will work closely with our...
-
Site Reliability Engineer, AI Infrastructure
2 weeks ago
Palo Alto, California, United States Tesla Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our AI Infrastructure team at Tesla. As a key member of our team, you will be responsible for maintaining and improving our platform to ensure our Full-Self-Driving (FSD), Tesla Bot & Dojo engineering teams have the necessary tools and resources to be productive.Key...
-
Staff Site Reliability Engineer, PLM Operations
4 weeks ago
Palo Alto, California, United States Tesla Full timeAbout the RoleWe are seeking a highly skilled Staff Site Reliability Engineer to join our PLM Operations team at Tesla. As a key member of our team, you will be responsible for ensuring the reliability and performance of our PLM systems, which are critical to the success of our engineering design tools.Key ResponsibilitiesDefine Service Level Objectives...
Site Reliability Engineer
2 months ago
We are seeking a highly skilled Site Reliability Engineer to join our team at General Motors. As a Site Reliability Engineer, you will play a critical role in ensuring the stability, scalability, and reliability of our software-defined vehicle solutions.
Key Responsibilities- Lead the Site Reliability engineering effort to improve anomaly detection, platform stability, and resilience using modern best practices.
- Partner with engineering and customer success teams to ensure comprehensive monitoring and incident response and management processes are in place.
- Help create a culture of accountability and ownership of excellent customer experience.
- Implement scalable, reliable, secure SRE and Observability platforms to monitor the health of our production system and provide a holistic view of the environment.
- Deliver tools and software to improve the reliability, scalability, and operability of services.
- Collaborate with engineering teams to analyze and provide inputs in architecture, infrastructure resources, observability to achieve reliability and scalability goals.
- Collaborate with engineering teams to conduct production readiness reviews, deployment, operation, and refinement.
- Partner with stakeholders to ensure data and observability tools are effectively integrated with other systems and processes.
- Partner with stakeholders to identify, measure, and monitor availability, latency, and overall service health.
- Participate in on-call engineering duty to support production.
- Instill Site Reliability best practices through automation, data insights, and observability.
- Perform initial incident root cause analysis with engineers, carry out incident postmortem.
- Build run books and tooling to carry out production support activities.
- Actively participate in technical discussions and deep dives with Architectural groups.
- 7+ years of hands-on SRE experience (software development, systems monitoring) with at least one of the public cloud providers – Azure (strongly preferred), AWS, GCP.
- Experience operating high-availability, fault-tolerant, scalable, distributed software in production: Building monitoring, defining alerts, writing run books, establishing dashboards, etc.
- Experience with monitoring and log aggregation frameworks, such as Azure Monitor/Sentinel, Data Dog (preferred), Dynatrace, Elasticsearch, Kibana, Logstash.
- Strong working knowledge of Docker, Kubernetes, Terraform, Chef, or Ansible.
- Experience troubleshooting JVM-based applications.
- Chaos engineering implementation and experience a big plus.
- Strong experience in scripting/programming – Python, Java, PowerShell, Bash.
- Experience with configuration and management of SSO, Big Data/No-SQL in cloud infrastructure.
- CI/CD automation frameworks knowledge – Jenkins/Azure DevOps.
- Strong understanding of public cloud networking components.
- You have a story to tell how you lead and influence cross-organization effort to improve uptime to at least 99.99%.
- Working experience with source control management tools, such as Bitbucket, GitHub, Azure DevOps (Preferred).
- Experience with IoT stack is a big plus.
- BS/MS in Computer Science/Engineering preferred.