Reliability Engineering Team Lead
6 days ago
At Navan, our vision is centered around providing a seamless user experience. We are passionate about delivering a one-stop-shop for business travelers, catering to their diverse needs and preferences.
We are committed to building robust, scalable, and efficient infrastructure that ensures our services are always available when needed most. As we continue to grow rapidly, we are seeking a Site Reliability Engineering (SRE) Manager to join our team in Palo Alto, California.
About the RoleAs an SRE Manager, you will lead a team of experienced SREs, driving innovation in infrastructure design, automation, and tooling. Your key responsibilities will include spearheading the development of infrastructure services that power Navan's systems, serving thousands of travelers daily. You will partner with development, release, and productivity teams to identify user needs and deliver cutting-edge solutions.
You will oversee a diverse range of systems and technologies, focusing on building autonomous, fault-tolerant, and monitored infrastructure. This infrastructure will be optimized for simplicity, performance, and uptime. Collaborating with backend and frontend engineering teams, you will ensure that our systems are scalable, reliable, and efficient. Additionally, you will lead efforts to design and implement infrastructure capable of supporting our exponential growth while maintaining the highest levels of service reliability and operational excellence.
Your Key Responsibilities- Lead & Mentor the SRE Team: Guide and develop a high-performing team of SREs, fostering a culture of collaboration, reliability, and continuous improvement.
- Drive Infrastructure Reliability & Automation: Collaborate with Engineering and Product teams to design and implement scalable, fault-tolerant systems. Leverage IaC tools (e.g., Terraform, CloudFormation) and microservices architectures to automate and improve infrastructure.
- Incident Management: Improve incident response processes, reduce MTTR, and proactively mitigate risks. Apply resiliency patterns to ensure systems are fault-tolerant and highly available.
- Define & Measure SLOs: Develop service-level objectives (SLOs) and KPIs to track and improve system reliability, using tools like NewRelic or DataDog for observability.
- 24x7 Production Support: Ensure system availability in a 24x7 environment, applying expertise in AWS (e.g., ECS, Lambda, DynamoDB) and database management for optimal performance.
- Optimize CI/CD Pipelines: Automate and streamline deployment workflows using tools like Jenkins or GitHub Actions to ensure faster and more reliable deployments.
- Resource Management: Manage team resources, including capacity planning, hiring, and upskilling, to meet evolving business needs.
- 8+ years in Site Reliability Engineering, DevOps, or Infrastructure roles, with at least 3 years in a leadership position.
- Proven ability to lead and mentor teams, fostering a culture of collaboration and reliability.
- Hands-on experience with AWS cloud technologies, Infrastructure as Code (Terraform/CloudFormation), microservices architectures, deployment automation (Jenkins/GitHub Actions), and observability tools (NewRelic/DataDog).
- Strong background in designing scalable, fault-tolerant systems, improving incident response, and driving operational improvements.
- Excellent interpersonal and communication skills, with the ability to work effectively across cross-functional teams.
-
Reliability Engineering Lead
2 days ago
Palo Alto, California, United States Luma AI Full timeCompany Overview:Luma AI is a pioneering company in the field of multimodal AI, aiming to expand human imagination and capabilities. Our mission is to build systems that can see, understand, show, and explain, ultimately interacting with our world to effect change.Job Description:We are seeking a highly skilled Reliability Engineer to join our infrastructure...
-
Reliability Engineering Expert
13 hours ago
Palo Alto, California, United States Wing Inflatables, Inc. Full timeRole OverviewWing is seeking a highly experienced Design Reliability Engineer to join our Design for Excellence team in Palo Alto, California. As a key contributor to ensuring the reliability and robustness of our hardware designs, you will leverage your deep understanding of testing methodologies and reliability engineering principles to drive significant...
-
Site Reliability Engineering Manager
3 weeks ago
Palo Alto, California, United States Plume Full timeAbout the JobThe Technical Manager will lead a team of Site Reliability Engineers, providing technical guidance and oversight. Key responsibilities include:Supervise a team of Site Reliability Engineers who provide first-line support to Customer Clouds.Attend and conduct customer Meetings for Project and Roadmap specification.Manage growth and performance of...
-
Semi Conductor Reliability Assurance Lead
3 weeks ago
Palo Alto, California, United States Tesla Full timeAbout the RoleWe are seeking a talented Semi Conductor Reliability Assurance Lead to join our team at Tesla. In this role, you will be responsible for leading the development of reliability guidelines and ensuring the reliability of our semi conductor-based systems.Key ResponsibilitiesThis is a challenging role that requires a strong technical foundation in...
-
Cloud Engineer Team Lead
4 weeks ago
Palo Alto, California, United States Plume Full timeAbout the RoleWe're looking for a seasoned Technical Manager to Captain our Site Reliability Engineering Team. As a key member of our team, you'll be responsible for supervising a team of Site Reliability Engineers who provide first-line support to Customer Clouds.Key responsibilities include:Deployments, On-call, Application Provisioning are some of the...
-
Reliability Engineering Expert
2 days ago
Palo Alto, California, United States Testing Solutions GmbH Full timeUnlock the Future of Multimodal AILuma AI is revolutionizing the field of artificial intelligence by pushing beyond language models and developing more aware, capable, and useful systems. As a Senior Software Engineer in our Reliability team, you will play a critical role in defining, measuring, and improving the reliability of our GPU clusters. Our SRE team...
-
Reliability Engineer for Distributed Systems
3 weeks ago
Palo Alto, California, United States Tesla Full timeCompany OverviewTesla is a leading electric vehicle manufacturer accelerating the world's transition to sustainable energy. Our mission-critical systems enable our engineers to design and develop innovative solutions.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our Design Technology Operations team. This position will be...
-
Reliability Engineering Professional
3 weeks ago
Palo Alto, California, United States Tesla Full time**About the Role:**Tesla is looking for a highly motivated Reliability Engineering Professional to join our team. As a key member of our engineering group, you will play a crucial role in ensuring the reliability of our innovative products.This position offers an exciting opportunity to contribute to the development of cutting-edge technology and shape the...
-
Reliable Database Systems Engineer
4 days ago
Palo Alto, California, United States Amazon Full timeAmazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, providing a robust suite of products and services to power businesses. We are seeking an experienced Reliable Database Systems Engineer to join our team.This role requires expertise in designing, implementing, and maintaining large-scale database systems that...
-
Reliability Solutions Engineer
3 weeks ago
Palo Alto, California, United States Luma AI Full time**Job Overview**Luma AI is seeking a highly skilled Reliability Solutions Engineer to join our team. As a key member of our Infrastructure and Research teams, you will be responsible for ensuring the health and reliability of our GPU clusters.We are looking for someone with a strong background in cloud infrastructure, containerization, and...
-
Software Engineering Team Lead
3 weeks ago
Palo Alto, California, United States salesforce, inc. Full timeAbout the RoleWe are seeking an exceptional Software Engineering Team Lead to drive the execution and delivery of features for our engineering teams. As a key member of our team, you will be responsible for collaborating with multi-functional teams, architects, product owners, and engineers to ensure successful project outcomes.Your ImpactDrive critical...
-
Software Engineering Team Lead
21 hours ago
Palo Alto, California, United States KOHLER Full timeWe're looking for a seasoned Software Engineering Team Lead to join our team at Kohler Ventures in Palo Alto, CA or New York City, NY. As a key member of our engineering team, you'll oversee multiple projects, manage a team of Cloud Engineers, and ensure timely delivery and high-quality results.The ideal candidate will have 5+ years of experience in leading...
-
Data Engineering Team Lead
2 days ago
Palo Alto, California, United States Databook Full timeOpportunityAs the Head of Data Engineering and Architecture, you will play a pivotal role by driving the vision and execution of our data capabilities. You have a successful track record demonstrating the business value of data engineering projects by tying them to new customer-facing products. You will work closely with Data Scientists, Product Managers, ML...
-
Network Reliability Engineer
2 days ago
Palo Alto, California, United States Avature Full timeJob Summary:We are seeking a highly skilled Network Reliability Engineer to join our team. As a key member of our engineering group, you will be responsible for designing, implementing, and maintaining the stability and scalability of our global network infrastructure.About the Role:In this role, you will work closely with our cross-functional teams to build...
-
Software Engineering Team Lead
18 hours ago
Palo Alto, California, United States KOHLER Full timeJob OverviewKohler Ventures is an independent company wholly owned by Kohler Co., a global leader in kitchen and bath products, tile, home interiors, and hospitality. Our mission is to build digital businesses empowering consumers to lead healthier lives through technology, science, and design.We're seeking a skilled Software Engineering Manager to lead our...
-
Palo Alto, California, United States Wing Inflatables, Inc. Full timeAbout Wing:We are a pioneer in drone delivery technology, offering a safe, fast, and sustainable solution for last mile logistics. Our mission is to create the preferred means of delivery for the planet by building a workforce that's representative of the global communities we serve.Our Design for Excellence (DFX) team in Palo Alto, California, is seeking a...
-
Software Engineering Team Lead
4 weeks ago
Palo Alto, California, United States Kohler Full timeKohler Ventures, a global leader in the manufacture of kitchen and bath products, tile and home interiors, and an international host to award-winning hospitality and world-class golf destinations, is seeking an experienced Sr. Android Engineer to join our team.We empower each associate to #BecomeMoreAtKohler with a competitive total rewards package to...
-
Technical Team Lead
1 day ago
Palo Alto, California, United States Plume Design, Inc. Full timeAbout the TeamOur Site Reliability Engineering Team is focused on deployments, fixes, and sustainability. We're looking for a seasoned Technical Manager to lead this team and drive excellence in customer satisfaction.The ideal candidate should have strong technical knowledge in key areas, including production troubleshooting, team management, and technical...
-
Software Engineering Team Lead
3 weeks ago
Palo Alto, California, United States Celonis GmbH Full timeAbout Us">Celonis GmbH is a global leader in Process Mining technology and one of the world's fastest-growing SaaS firms. We believe there is a massive opportunity to unlock productivity by placing data and intelligence at the core of business processes – and for that, we need talented individuals like you to join us.">The Role">We're looking for an...
-
Technical Site Reliability Engineering Leader
4 weeks ago
Palo Alto, California, United States Plume Full timeAbout the CompanyPlume is a leader in the smart home and small business market, delivering services to over 50 million locations globally. Our software-defined network platform allows CSPs to decouple their service offerings from hardware and rapidly curate and deliver new services over a multi-vendor, open-platform architecture.We're looking for a seasoned...