Staff Site Reliability Engineer, Platform Engineering

3 days ago


Fremont, California, United States Tesla Full time
Job Description

Tesla's Platform Engineering team is seeking a highly skilled Site Reliability Engineer to join our dynamic team. As a Site Reliability Engineer, you will play a critical role in building and maintaining our Kubernetes clusters using infrastructure-as-code tools like Ansible, Terraform, ArgoCD, and Helm. You will work closely with application teams to ensure their success on our platform.

Responsibilities
  • Collaborate with developers to deploy applications and provide support
  • Design and implement new features to improve platform stability and updates
  • Manage Kubernetes clusters on-prem and in the cloud to support growing workloads
  • Participate in architecture design and troubleshooting of live applications with product teams
  • Participate in a 24x7 on-call rotation, including a weekday shift and a weekend shift every 6-8 weeks
  • Influence architectural decisions with a focus on security, scalability, and high-performance
  • Set up and maintain monitoring, metrics, and reporting systems for fine-grained observability and actionable alerting
  • Author technical documentation for workflows, processes, and best practices
Requirements
  • Experience managing web-scale infrastructure in a production *nix environment
  • Ability to prioritize tasks and work independently with an analytical mind and a bias for action
  • Advanced or expert-level Linux administration and performance tuning skills
  • Bachelor's Degree in Computer Science, Computer Engineering, or equivalent experience or evidence of exceptional ability
  • Advanced experience with configuration management systems such as Ansible, Terraform, or Puppet
  • Demonstrable knowledge of the Linux operating system internals, networking stack, filesystems, resource scheduling, and process management
  • Exposure to AWS or other cloud infrastructure providers
  • Experience managing container-based workloads using Kubernetes or other orchestration software in production (ArgoCD, Helm)
  • Proficiency in a high-level language like Python, Go, Ruby, and/or Java
Compensation and Benefits

As a full-time Tesla employee, you are eligible for the following benefits at day 1 of hire:

  • Aetna PPO and HSA plans with $0 payroll deduction
  • Family-building, fertility, adoption, and surrogacy benefits
  • Dental (including orthodontic coverage) and vision plans with options that have a $0 paycheck contribution
  • Company-paid HSA contribution when enrolled in the High Deductible Aetna medical plan with HSA
  • Healthcare and Dependent Care Flexible Spending Accounts (FSA)
  • LGBTQ+ care concierge services
  • 401(k) with employer match, Employee Stock Purchase Plans, and other financial benefits
  • Company-paid Basic Life, AD&D, short-term, and long-term disability insurance
  • Employee Assistance Program
  • Sick and Vacation time (Flex time for salary positions), and Paid Holidays
  • Back-up childcare and parenting support resources

Voluntary benefits include critical illness, hospital indemnity, accident insurance, theft and legal services, and pet insurance. Weight Loss and Tobacco Cessation Programs, Tesla Babies program, Commuter benefits, Employee discounts and perks program.

Expected Compensation: $168,000 - $300,000/annual salary + cash and stock awards + benefits. Pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience. The total compensation package for this position may also include other elements dependent on the position offered. Details of participation in these benefit plans will be provided if an employee receives an offer of employment.



  • Fremont, California, United States Tesla Full time

    Job Title: Staff Site Reliability EngineerAt Tesla, we're looking for a highly skilled Staff Site Reliability Engineer to join our Engineering Tools team. As a key member of our team, you will be responsible for designing, implementing, and maintaining automation solutions for provisioning, configuration, and monitoring of engineering tools...


  • Fremont, California, United States Tesla Full time

    About the RoleTesla's Platform Engineering team is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in building and maintaining our Kubernetes clusters using infrastructure-as-code tools like Ansible, Terraform, ArgoCD, and Helm.ResponsibilitiesCollaborate with developers to...


  • Fremont, California, United States Tesla Full time

    Job Title: Staff Site Reliability EngineerAt Tesla, we're looking for a highly skilled Staff Site Reliability Engineer to join our Engineering Tools team. As a key member of our team, you'll be responsible for designing, implementing, and maintaining automation solutions for provisioning, configuration, and monitoring of engineering tools...


  • Fremont, California, United States Tesla Full time

    Job Title: Staff Site Reliability Engineer, Engineering ToolsAt Tesla, we're looking for a highly skilled Staff Site Reliability Engineer to join our Engineering Tools team. As a key member of our team, you'll be responsible for designing, implementing, and maintaining automation solutions for provisioning, configuration, and monitoring of engineering tools...


  • Fremont, California, United States Tesla Full time

    Job SummaryWe are seeking a highly skilled Staff Site Reliability Engineer to join our Infrastructure Engineering team at Tesla. As a key member of our team, you will be responsible for maintaining and improving our platform to ensure our cross-functional teams have the necessary tools and resources to be productive.ResponsibilitiesSupport factory teams...


  • Fremont, California, United States Tesla Full time

    About the RoleAt Tesla, we're looking for a highly skilled Staff Site Reliability Engineer to join our Engineering Tools team. As a key member of our team, you'll be responsible for designing, implementing, and maintaining automation solutions for provisioning, configuration, and monitoring of engineering tools infrastructure.ResponsibilitiesDesign,...


  • Fremont, California, United States Tesla Full time

    Job SummaryTesla is seeking a highly skilled Staff Site Reliability Engineer to join our Engineering Tools team. As a key member of our team, you will be responsible for designing, implementing, and maintaining automation solutions for provisioning, configuration, and monitoring of engineering tools infrastructure.ResponsibilitiesDesign, implement, and...


  • Fremont, California, United States Info Way Solutions Full time

    SRE Job DescriptionWe are seeking an experienced Site Reliability Engineer to join our team at Info Way Solutions. The ideal candidate will have hands-on experience with setting up SRE platforms, defining SLI/SLO, and working with multiple observability, monitoring, and logging tools.Key Responsibilities:Design and implement scalable and reliable...


  • Fremont, California, United States Neuralink Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our Infrastructure Team at Neuralink. As a Site Reliability Engineer, you will play a critical role in ensuring the seamless operation of our systems, enabling the company to move as quickly and safely as possible.Key Responsibilities:Collaborate closely...


  • Fremont, California, United States Tesla Full time

    Job DescriptionTesla is seeking highly motivated software engineering students interested in the field of Site Reliability Engineering. Our SREs are incredibly talented and responsible for planning, designing, and managing key infrastructure components.ResponsibilitiesCollaborate with a cross-functional team of SRE engineers, architects, and other...


  • Fremont, California, United States Tesla Full time

    About the RoleTesla is seeking highly motivated software engineering students interested in the field of Site Reliability Engineering. Our SREs are incredibly talented and responsible for planning, designing, and managing key infrastructure components.ResponsibilitiesCollaborate with a cross-functional team of SRE engineers, architects, and other...


  • Fremont, California, United States Tesla Full time

    Job Title: Reliability Engineer for Drive Units SemiWe are seeking a highly skilled Reliability Engineer to join our team at Tesla. As a Reliability Engineer for Drive Units Semi, you will play a key role in designing reliability into our ground-breaking drive units for the Tesla Semi.Key Responsibilities:Design reliability into drive units for the Tesla...


  • Fremont, California, United States Global Channel Management Full time

    Job Title: Automotive Field Site EngineerGlobal Channel Management is seeking an experienced Automotive Field Site Engineer to join our team. As a key member of our operations team, you will be responsible for managing site/service functions supporting a major automotive OEM.Key Responsibilities:Manage site/service functions to ensure seamless operations and...


  • Fremont, California, United States Tesla Full time

    Job Title: Reliability Engineer for Drive Units SemiWe are seeking a highly skilled Reliability Engineer to join our team at Tesla. As a key member of our engineering team, you will play a critical role in designing and developing reliable drive units for our electric vehicles.Key Responsibilities:Design and develop reliability testing plans for drive...


  • Fremont, California, United States Saint-Gobain Full time

    About the RoleWe are seeking a highly skilled Project Reliability Engineer to join our team at Saint-Gobain. As a key member of our operations team, you will play a critical role in ensuring the reliability and efficiency of our manufacturing processes.Key ResponsibilitiesPlan and execute projects to improve process reliability and efficiencyAnalyze data to...


  • Fremont, California, United States Saint-Gobain Full time

    Job DescriptionWe are seeking a highly skilled Project Reliability Engineer to join our team at Saint-Gobain. As a key member of our operations team, you will play a critical role in ensuring the reliability and efficiency of our plant operations.Key ResponsibilitiesPlan and execute projects to improve plant operations and efficiencyAnalyze data to identify...


  • Fremont, California, United States Saint-Gobain Full time

    About the RoleWe are seeking a highly skilled Project Reliability Engineer to join our team at Saint-Gobain. As a key member of our operations team, you will play a critical role in ensuring the reliability and efficiency of our manufacturing processes.Key ResponsibilitiesPlan and execute projects to improve process reliability and efficiencyAnalyze data to...


  • Fremont, California, United States Tesla Full time

    Job Title: Reliability Engineer for Drive Units SemiWe are seeking a highly skilled Reliability Engineer to join our team at Tesla. As a Reliability Engineer, you will play a key role in designing reliability into our ground-breaking drive units for our Tesla Semi.Key Responsibilities:Apply solid knowledge of reliability methods and mechanical systems to...


  • Fremont, California, United States Tesla Full time

    Job Title: Reliability Engineer for Drive Units SemiWe are seeking a highly skilled Reliability Engineer to join our team at Tesla. As a Reliability Engineer for Drive Units Semi, you will play a key role in designing reliability into our ground-breaking drive units for the Tesla Semi.Key Responsibilities:Apply solid knowledge of reliability methods and...


  • Fremont, California, United States Global Channel Management Full time

    Job Title: Automotive Field Site EngineerGlobal Channel Management is seeking a highly skilled Automotive Field Site Engineer to join our team. As a key member of our organization, you will be responsible for managing site/service functions supporting a major automotive OEM.Key Responsibilities:Manage the site/service functions, ensuring seamless operations...