Site Reliability Engineer
2 days ago
Mistral AI is a leading innovator in the field of open-source large language models. Our mission is to make AI ubiquitous and open, bridging the gap between technology and businesses of all sizes.
Job SummaryWe are seeking a highly experienced Site Reliability Engineer to shape the reliability, scalability, and performance of our platform and customer-facing applications. As a key member of our team, you will work closely with software engineers and research teams to ensure our systems meet and exceed our customers' expectations.
Key Responsibilities- Design, build, and maintain scalable, highly available, and fault-tolerant infrastructures to support our web services and ML workloads.
- Implement and improve monitoring, alerting, and incident response systems to ensure optimal system performance and minimize downtime.
- Collaborate with AI/ML researchers to develop and implement solutions that enable safe and reproducible model-training experiments.
- Drive continuous improvement in infrastructure automation, deployment, and orchestration using tools like Kubernetes, Flux, and Terraform.
- Master's degree in Computer Science, Engineering, or a related field.
- 5+ years of experience in a DevOps/SRE role.
- Strong experience with cloud computing and highly available distributed systems.
- Exposure to site reliability issues in critical environments.
- Experience working against reliability KPIs (observability, alerting, SLAs).
- A fun, young, multicultural team and collaborative work environment.
- Competitive salary and bonus structure.
- Comprehensive benefits package.
- Opportunities for professional growth and development.
-
Site Reliability Engineer
3 days ago
Palo Alto, California, United States X (formerly Twitter) Full timeAbout the RoleWe're seeking a highly skilled Site Reliability Engineer to join our Command Center Team at X (formerly Twitter). As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and reliability of our services, which are used by millions of users worldwide.Key ResponsibilitiesTriage and troubleshoot complex...
-
Site Reliability Engineer
3 days ago
Palo Alto, California, United States Criteo Full timeAbout the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available systems to support our growing...
-
Site Reliability Engineer
4 weeks ago
Palo Alto, California, United States Criteo Full timeAbout the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and applications.Key ResponsibilitiesDesign, develop, and maintain scalable and reliable software systemsCollaborate with...
-
Site Reliability Engineer
3 weeks ago
Palo Alto, California, United States Criteo Full timeAbout the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our infrastructure and applications.Key ResponsibilitiesDesign, develop, and maintain scalable and highly available systems and...
-
Staff Site Reliability Engineer
3 weeks ago
Palo Alto, California, United States General Motors Full timeAbout the RoleAt General Motors, we're committed to innovation and excellence in all aspects of our business. As a Staff Site Reliability Engineer, you'll play a critical role in ensuring the reliability and scalability of our software systems. You'll work closely with cross-functional teams to design, implement, and maintain high-quality software solutions...
-
Site Reliability Engineer
1 day ago
Palo Alto, California, United States General Motors Full timeJob DescriptionAt General Motors, we are pioneering next-generation software solutions for commercial fleet owners and their drivers. As a Site Reliability Engineer, you will play a critical role in improving the reliability, scalability, and operability of our production system.Responsibilities:Lead the Site Reliability engineering effort to improve anomaly...
-
Staff Site Reliability Engineer
4 weeks ago
Palo Alto, California, United States General Motors Full timeJob DescriptionAt General Motors, we're revolutionizing the automotive industry with software-defined vehicles. As a Site Reliability Engineer, you'll play a critical role in ensuring the reliability, scalability, and security of our production systems.ResponsibilitiesLead the Site Reliability engineering effort to improve anomaly detection, platform...
-
Site Reliability Engineer
3 weeks ago
Palo Alto, California, United States Rubrik Full timeAbout The Role:Rubrik is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and availability of our cloud-based data security platform. You will work closely with our development team to identify and resolve issues, and collaborate with our operations team...
-
CDN Site Reliability Engineer
4 weeks ago
Palo Alto, California, United States X (formerly Twitter) Full timeAbout XX is a global digital public square, committed to protecting freedom of speech and building the future of unlimited interactivity. Our mission is to empower every user to freely create and share ideas, fostering open public discourse without barriers.Job SummaryWe are seeking a highly motivated CDN Site Reliability Engineer to join our Edge Services...
-
Site Reliability Engineer
4 weeks ago
Palo Alto, California, United States Rubrik Full timeAbout The RoleRubrik is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and durability of our databases, as well as establishing best practices for internal teams to write performant SQL queries.Key ResponsibilitiesEnsure high availability and...
-
Senior Site Reliability Engineer
4 weeks ago
Palo Alto, California, United States Rubrik Full timeAbout the RoleRubrik is seeking a Senior Site Reliability Engineer to join our team. As a Senior Site Reliability Engineer, you will be responsible for ensuring the high availability and durability of our databases, establishing best practices for internal teams to write performant SQL queries, and performing periodic database upgrades minimizing downtime...
-
Senior Site Reliability Engineer
4 weeks ago
Palo Alto, California, United States Rubrik Full timeAbout the RoleRubrik is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a Senior Site Reliability Engineer, you will be responsible for ensuring the high availability and durability of our databases, establishing best practices for internal teams to write performant SQL queries, and performing periodic database upgrades with...
-
Site Reliability Engineer
2 days ago
Palo Alto, California, United States Rubrik Full timeAbout the RoleRubrik is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and durability of our databases, establishing best practices for internal teams to write performant SQL queries, and performing periodic database upgrades with minimal...
-
Senior CDN Site Reliability Engineer
2 weeks ago
Palo Alto, California, United States X (formerly Twitter) Full timeAbout XX is a global digital public square, committed to protecting freedom of speech and building the future of unlimited interactivity. Our mission is to empower every user to freely create and share ideas, fostering open public discourse without barriers.Job SummaryWe're seeking a highly motivated Senior/Staff CDN Site Reliability Engineer to join our...
-
Site Reliability Engineering Team Lead
1 day ago
Palo Alto, California, United States Plume Design Inc Full timeJob Title: Technical Manager, Site Reliability EngineeringWe're seeking a seasoned Technical Manager with expertise in Customer Facing environments to lead our Site Reliability Engineering Team. This team focuses on deployments, fixes, and sustainability. The ideal candidate will have strong technical knowledge in key areas while prioritizing customer...
-
Site Reliability Engineer
3 weeks ago
Palo Alto, California, United States Rubrik Full timeAbout The RoleThe Rubrik Engineering team is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and durability of our databases, as well as establishing best practices for internal teams to write performant SQL queries.Key ResponsibilitiesEnsure high...
-
Site Reliability Engineer
1 day ago
Palo Alto, California, United States Criteo Full timeAbout the RoleCriteo is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining scalable and highly available systems that support our business-critical applications.You will work closely with our engineering teams to identify and resolve...
-
Site Reliability Engineer
4 weeks ago
Palo Alto, California, United States Rubrik Full timeAbout The RoleRubrik is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the smooth operation of our infrastructure services, ensuring high availability and durability of our databases, and driving reliability, availability, and efficiency improvements to our Polaris Cloud...
-
Site Reliability Engineer
2 weeks ago
Palo Alto, California, United States Rubrik Full timeAbout The Role:As a Site Reliability Engineer at Rubrik, you will play a critical role in ensuring the smooth operation of our infrastructure services. This includes maintaining high availability and durability of our databases, establishing best practices for internal teams to write performant SQL queries, and performing periodic database upgrades with...
-
Staff Site Reliability Engineer, Fleetnet
1 week ago
Palo Alto, California, United States Tesla Full timeAbout TeslaTesla is a pioneering electric vehicle and clean energy company that's revolutionizing the way we think about transportation and energy. We're a team of innovators, engineers, and problem-solvers who are passionate about making a difference.Job SummaryWe're seeking a highly skilled Staff Site Reliability Engineer to join our Fleetnet team. As a...