Current jobs related to Site Reliability Engineering Team Lead - Palo Alto, California - Plume Design, Inc.


  • Palo Alto, California, United States Navan Group Full time

    At Navan, our vision is centered around providing a seamless user experience. We are passionate about delivering a one-stop-shop for business travelers, catering to their diverse needs and preferences.We are committed to building robust, scalable, and efficient infrastructure that ensures our services are always available when needed most. As we continue to...


  • Palo Alto, California, United States Plume Full time

    About the JobThe Technical Manager will lead a team of Site Reliability Engineers, providing technical guidance and oversight. Key responsibilities include:Supervise a team of Site Reliability Engineers who provide first-line support to Customer Clouds.Attend and conduct customer Meetings for Project and Roadmap specification.Manage growth and performance of...


  • Palo Alto, California, United States Plume Full time

    About the CompanyPlume is a leader in the smart home and small business market, delivering services to over 50 million locations globally. Our software-defined network platform allows CSPs to decouple their service offerings from hardware and rapidly curate and deliver new services over a multi-vendor, open-platform architecture.We're looking for a seasoned...


  • Palo Alto, California, United States Assured Full time

    About Assured">At Assured, we modernize insurance by providing software solutions to large insurers. We empower them to win in a technology-driven world with self-service claim filing software and backend fraud detection.">Job Overview">We are looking for a Site Reliability Engineer to join our team. The ideal candidate will have experience working in a...


  • Palo Alto, California, United States Tesla Full time

    Role DescriptionThis is a challenging opportunity to work with cutting-edge technology and contribute to the development of automation tools. As a Site Reliability Engineer, you will drive root cause analysis of system failures, manage containerization technology, and maintain site performance using various tools.Expected CompensationThe estimated annual...


  • Palo Alto, California, United States Tesla Full time

    Company OverviewTesla is a leading electric vehicle manufacturer accelerating the world's transition to sustainable energy. Our mission-critical systems enable our engineers to design and develop innovative solutions.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our Design Technology Operations team. This position will be...


  • Palo Alto, California, United States Plume Full time

    About the RoleWe're looking for a seasoned Technical Manager to Captain our Site Reliability Engineering Team. As a key member of our team, you'll be responsible for supervising a team of Site Reliability Engineers who provide first-line support to Customer Clouds.Key responsibilities include:Deployments, On-call, Application Provisioning are some of the...


  • Palo Alto, California, United States Luma AI Full time

    Company Overview:Luma AI is a pioneering company in the field of multimodal AI, aiming to expand human imagination and capabilities. Our mission is to build systems that can see, understand, show, and explain, ultimately interacting with our world to effect change.Job Description:We are seeking a highly skilled Reliability Engineer to join our infrastructure...


  • Palo Alto, California, United States Wing Inflatables, Inc. Full time

    Role OverviewWing is seeking a highly experienced Design Reliability Engineer to join our Design for Excellence team in Palo Alto, California. As a key contributor to ensuring the reliability and robustness of our hardware designs, you will leverage your deep understanding of testing methodologies and reliability engineering principles to drive significant...


  • Palo Alto, California, United States Plume Design, Inc. Full time

    About the TeamOur Site Reliability Engineering Team is focused on deployments, fixes, and sustainability. We're looking for a seasoned Technical Manager to lead this team and drive excellence in customer satisfaction.The ideal candidate should have strong technical knowledge in key areas, including production troubleshooting, team management, and technical...


  • Palo Alto, California, United States Tesla Full time

    About the RoleWe are seeking a talented Semi Conductor Reliability Assurance Lead to join our team at Tesla. In this role, you will be responsible for leading the development of reliability guidelines and ensuring the reliability of our semi conductor-based systems.Key ResponsibilitiesThis is a challenging role that requires a strong technical foundation in...


  • Palo Alto, California, United States Testing Solutions GmbH Full time

    Unlock the Future of Multimodal AILuma AI is revolutionizing the field of artificial intelligence by pushing beyond language models and developing more aware, capable, and useful systems. As a Senior Software Engineer in our Reliability team, you will play a critical role in defining, measuring, and improving the reliability of our GPU clusters. Our SRE team...


  • Palo Alto, California, United States Tesla Full time

    **About the Role:**Tesla is looking for a highly motivated Reliability Engineering Professional to join our team. As a key member of our engineering group, you will play a crucial role in ensuring the reliability of our innovative products.This position offers an exciting opportunity to contribute to the development of cutting-edge technology and shape the...


  • Palo Alto, California, United States Amazon Full time

    Amazon Web Services (AWS) is the world's most comprehensive and broadly adopted cloud platform, providing a robust suite of products and services to power businesses. We are seeking an experienced Reliable Database Systems Engineer to join our team.This role requires expertise in designing, implementing, and maintaining large-scale database systems that...


  • Palo Alto, California, United States Tesla Full time

    About the JobWe are looking for an experienced Site Reliability Engineer to join our team. Your responsibilities will include building release processes, managing Kubernetes infrastructure, and maintaining site performance. You will also participate in on-call rotations and facilitate production and security incidents.Required SkillsTo succeed in this role,...


  • Palo Alto, California, United States Luma AI Full time

    **Job Overview**Luma AI is seeking a highly skilled Reliability Solutions Engineer to join our team. As a key member of our Infrastructure and Research teams, you will be responsible for ensuring the health and reliability of our GPU clusters.We are looking for someone with a strong background in cloud infrastructure, containerization, and...


  • Palo Alto, California, United States salesforce, inc. Full time

    About the RoleWe are seeking an exceptional Software Engineering Team Lead to drive the execution and delivery of features for our engineering teams. As a key member of our team, you will be responsible for collaborating with multi-functional teams, architects, product owners, and engineers to ensure successful project outcomes.Your ImpactDrive critical...


  • Palo Alto, California, United States KOHLER Full time

    We're looking for a seasoned Software Engineering Team Lead to join our team at Kohler Ventures in Palo Alto, CA or New York City, NY. As a key member of our engineering team, you'll oversee multiple projects, manage a team of Cloud Engineers, and ensure timely delivery and high-quality results.The ideal candidate will have 5+ years of experience in leading...


  • Palo Alto, California, United States Plume Design, Inc. Full time

    About the RoleThis leadership position requires a seasoned Technical Manager with advanced technical knowledge and working experience in various areas, including:Kubernetes (operation)Basic Terraform knowledgeExperience with modern cloud infrastructure, preferably AWSExperience with modern Linux operating systems (Enterprise Linux or Debian based)Experience...


  • Palo Alto, California, United States Databook Full time

    OpportunityAs the Head of Data Engineering and Architecture, you will play a pivotal role by driving the vision and execution of our data capabilities. You have a successful track record demonstrating the business value of data engineering projects by tying them to new customer-facing products. You will work closely with Data Scientists, Product Managers, ML...

Site Reliability Engineering Team Lead

1 month ago


Palo Alto, California, United States Plume Design, Inc. Full time

We're looking for a seasoned Technical Manager with extensive experience in Customer Facing environments to lead our Site Reliability Engineering Team. This team is focused on deployments, fixes, and sustainability.

The ideal candidate will have strong technical knowledge in key areas while focusing on customer satisfaction.

Key Responsibilities
  • Supervise a team of Site Reliability Engineers who provide first-line support to Customer Clouds, handling routine tasks such as deployments, on-call rotations, and application provisioning.
  • Attend and conduct customer meetings for project and roadmap specifications.
  • Manage growth and performance of SRE team members, including hands-on experience in executing or triaging issues.
  • Contribute improvements to the current automation, on-call process, and alerting systems.
  • Play a key role in the recruitment and retention of top talent.
Requirements
  • Availability to be in on-call rotation for production issues.
  • Availability to work with a distributed team in different time zones.
  • Advanced communication skills.
  • Experience managing people.
Desired Skills
  • 10+ years of experience with production troubleshooting.
  • Minimum 5+ years of experience leading or managing teams.
  • Bachelor's degree in a related field or equivalent experience; an advanced degree is preferred.
  • Technical knowledge and working experience with:
    • Kubernetes (operational expertise).
    • Basic Terraform knowledge.
    • Programming/scripting experience in languages like Perl, Python, PHP, GoLang, Java, etc.
    • Modern cloud infrastructure, preferably AWS.
    • Modern Linux operating systems (Enterprise Linux or Debian-based).
    • Self-managed monitoring and observability tools (e.g., Nagios/Icinga, Grafana, Prometheus).
    Differentiators
    • Troubleshooting production performance/service degradation or outage issues at scale.
    • Infrastructure troubleshooting in VMs and/or bare metal (ssh/Linux).
    • Advanced Kubernetes knowledge.
    • Advanced Terraform knowledge.
    • Customer-facing experience in previous roles.
    • Operating Kafka in production.
    • Operating NoSQL databases in production.
    • Operating relational databases in production.
    • Configuration management experience.

This position requires a hybrid schedule, with the employee expected to come into our Palo Alto, CA office three days a week. Candidates must be in commutable distance. We are not offering relocation at this time.

Total compensation includes an anticipated base salary range of $181,000 - $213,000, plus bonus, equity, and benefits. Benefits include a 401k plan, basic life insurance, and unparalleled health, dental, vision, and other perks. An employee's base salary and its position within the range may depend on job-related knowledge, education, skills, experience, and other business considerations.