Current jobs related to Lead Site Reliability Engineer - San Diego, California - Platform Science


  • San Diego, California, United States ACL Digital Full time

    Job DescriptionDuration: 0-12 monthsJob Summary: We are seeking a highly skilled Site Reliability Engineer to join our team at ACL Digital. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based applications.Key Responsibilities:Hands-on application management and support for AWS...


  • San Diego, California, United States Becton, Dickinson & Company Full time

    About the RoleA Site Reliability Engineering Manager at Becton, Dickinson & Company is responsible for ensuring the smooth operation of complex systems and services. They oversee a team of Site Reliability Engineers to maintain infrastructure, handle incident response, and implement continuous improvement initiatives.Key ResponsibilitiesLead a team of Site...


  • San Diego, California, United States Qualcomm Full time

    Job Title: Site Reliability EngineerJoin Qualcomm as a Site Reliability Engineer and be part of a highly collaborative team focused on provisioning and maintaining infrastructure and services with stability, sustainability, and security always on your mind.About the RoleWe are seeking a skilled Site Reliability Engineer to join our team. As a Site...


  • San Diego, California, United States Qualcomm Full time

    Job Title: Site Reliability EngineerAt Qualcomm, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the stability, scalability, and security of our infrastructure and services.Key Responsibilities:Monitor system health and detect anomaliesInvestigate and...


  • San Diego, California, United States Qualcomm Full time

    Job Title: Site Reliability EngineerAt Qualcomm, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the stability, sustainability, and security of our infrastructure and services.Key Responsibilities:Monitor system health and detect anomalies to prevent service...


  • San Jose, California, United States VDart Full time

    Job Title:Lead Site Reliability EngineerJob Summary:Vdart is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and availability of our software systems.Key Responsibilities:Design and implement automation scripts to improve operational...


  • San Diego, California, United States Insight Global Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Insight Global. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and performance of our cloud-based systems.Key Responsibilities:Design and implement scalable and highly available cloud...


  • San Diego, California, United States Insight Global Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Insight Global. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and performance of our cloud-based systems.Key Responsibilities:Design and implement scalable and highly available cloud...


  • San Diego, California, United States Commserve Technologies Inc Full time

    Job Title: Site Reliability EngineerAt Commserve Technologies Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our enterprise-level applications.Key Responsibilities:Configure, architect, and maintain...


  • San Diego, California, United States Commserve Technologies Inc Full time

    Job Title: Site Reliability EngineerAt Commserve Technologies Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our enterprise-level applications.Key Responsibilities:Configure, architect, and maintain...


  • San Diego, California, United States BAE Systems USA Full time

    Job DescriptionAt BAE Systems USA, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the seamless delivery of our cloud-based services.Key Responsibilities:Work collaboratively with cross-functional teams to design, implement, and maintain scalable and reliable...


  • San Diego, California, United States BAE Systems USA Full time

    Job DescriptionBAE Systems USA is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based systems.Key Responsibilities:Design and implement robust automation solutions to streamline infrastructure deployment and...


  • San Diego, California, United States BAE Systems USA Full time

    Job DescriptionAt BAE Systems USA, we're seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you'll play a critical role in ensuring the seamless delivery of our cloud-based services. Your expertise in cloud technologies, service lifecycle management, and infrastructure automation will be instrumental in driving our...


  • San Diego, California, United States BD Full time

    Job DescriptionA Site Reliability Engineering Manager is responsible for ensuring the smooth operation of systems and services at scale. This role involves leading a team of Site Reliability Engineers to maintain infrastructure, handle incident response, and improve system reliability and performance.Key Responsibilities:Lead and mentor a team of SREs to...


  • San Diego, California, United States BD Full time

    Job Title: Site Reliability Engineering ManagerJob Summary:A Site Reliability Engineering Manager is responsible for ensuring that systems and services run smoothly, reliably, and efficiently at scale. They manage a team of SREs to maintain infrastructure, handle incident response, and improve the system's reliability and performance.Key...


  • San Diego, California, United States BD (Becton, Dickinson and Company) Full time

    **Job Description Summary** A Site Reliability Engineering Manager is responsible for ensuring that systems and services run smoothly, reliably, and efficiently at scale. They manage a team of SREs to maintain infrastructure, handle incident response, and improve the system's reliability and performance. Below is a comprehensive job description for an SRE...


  • San Diego, California, United States BAE SYSTEMS Full time

    Job DescriptionAt BAE Systems, we're pushing the boundaries of innovation in the field of Site Reliability Engineering. We're seeking a highly skilled and motivated individual to join our team as a Site Reliability Engineer, where you'll play a critical role in ensuring the seamless delivery of our cloud-based solutions.Key Responsibilities:Deliver...


  • San Diego, California, United States Apple Full time

    Reliability Engineer LeadAt Apple, we're committed to delivering exceptional products and services that exceed our customers' expectations. As a Reliability Engineer Lead, you'll play a critical role in ensuring the reliability and quality of our silicon components.Key ResponsibilitiesDrive requirements and execution of products, process, and package...


  • San Francisco, California, United States Meraki Full time

    About the RoleCisco Meraki is a cloud-managed IT company and leader in cloud-controlled Wi-Fi, routing, and security. We're challenging the status quo with the power of diversity, inclusion, and collaboration.Job SummaryWe're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for...


  • San Diego, California, United States BAE SYSTEMS Full time

    Job DescriptionAt BAE Systems, we're pushing the boundaries of innovation in the field of Site Reliability Engineering. We're seeking a highly skilled and motivated individual to join our team as a Site Reliability Engineer, where you'll play a critical role in ensuring the seamless delivery of our cloud-based solutions.Key Responsibilities:Deliver...

Lead Site Reliability Engineer

2 months ago


San Diego, California, United States Platform Science Full time

Company Overview

At Platform Science, we are dedicated to revolutionizing connectivity in the transportation sector. Established in 2015, our open IoT platform collaborates with forward-thinking fleets, application developers, vehicle manufacturers, and equipment providers to deliver groundbreaking solutions for supply chain professionals worldwide.

Our workforce is a vibrant and diverse collective that champions innovative ideas. We prioritize hiring individuals with varied experiences and viewpoints to cultivate a company culture that thrives on growth through creativity. We believe in thoughtful actions and empathy, tackling challenges with resilience and ingenuity while promoting transparency across all levels.

Position Summary

We are seeking an experienced Senior Site Reliability Engineer to enhance our operational capabilities. This role focuses on resolving operational challenges and supporting development teams with critical business applications in production. Our primary goal is to maintain reliability across all production services and empower development teams to assess their reliability for informed decision-making.

The SRE team has the unique privilege of engaging with all facets of our platform, which operates entirely in the cloud (AWS, Azure, and GCP). Our applications and services utilize containerization and serverless architectures. If you are enthusiastic about exploring and supporting new technologies across various products—including mobile applications, hardware, websites, messaging queues, and serverless pipelines—while collaborating with a highly skilled team, this role is ideal for you.

Key Responsibilities

  • Develop and refine Continuous Integration/Continuous Deployment (CI/CD) pipelines, enhancing release management processes and associated tools.
  • Maintain Helm charts to facilitate application deployment and management.
  • Implement standardized observability solutions to enable development teams to manage their applications effectively.
  • Champion reliability initiatives, striving to meet uptime objectives and mentoring peers in SRE best practices.
  • Conduct thorough Production Readiness Reviews, collaborating with teams to establish Service Level Indicators and Service Level Objectives (SLIs/SLOs) to ensure high-quality services.
  • Design and implement software solutions to tackle operational challenges, enhancing system stability and reliability.
  • Participate in on-call duties, providing expert support to development teams for mission-critical applications in production.
  • Enhance application and system resilience through chaos engineering techniques.

Qualifications

  • Minimum of 5 years of hands-on experience in SRE or Platform Engineering roles.
  • Proven expertise (2+ years) with automation tools such as Jenkins, ArgoCD, or similar technologies.
  • Experience with Kubernetes (2+ years), Helm, and Docker in production settings.
  • Strong understanding of software development lifecycle (SDLC) concepts, CI/CD pipelines, and test-driven development.
  • Proficient in AWS, with knowledge of EKS, IAM, autoscaling, networking, and load balancing in production environments.
  • Skilled in programming languages such as Python, Bash, Node.js, and/or Go.
  • Familiarity with distributed tracing methodologies and observability tools like Prometheus, ELK, or Datadog.
  • Emphasis on documentation and promoting knowledge-sharing within the team and organization.
  • Demonstrated success in training and mentoring engineers.
  • Expertise in optimizing performance and managing costs in cloud environments.
  • Solid understanding of SLI/SLO concepts and adherence to SRE best practices.
  • Bachelor's degree in Computer Science or a related field.

Benefits Overview

Platform Science offers a comprehensive benefits package for regular, full-time employees, including:

  • Medical, dental, and vision insurance.
  • Short-term and long-term disability insurance.
  • Life and AD&D insurance.
  • 401k retirement plan.
  • Paid vacation, sick leave, and holidays.
  • Six weeks of paid parental leave.