Sr. Site Reliability Engineer, DevOps

2 weeks ago


Carlsbad CA, United States ATEC Spine Full time

The Senior Site Reliability Engineer (SRE) will be responsible for ensuring the availability, performance, scalability, and operational efficiency of the Informatix cloud platform. This role is focused on reducing manual operations work (toil), automating system reliability, and ensuring production-grade observability. The ideal candidate is a systems-focused engineer who is passionate about uptime, incident response, and continuous improvement through engineering solutions.

Essential Duties And Responsibilities

  • Serve as a primary contributor to the on-call rotation to maintain 24/7 uptime for production systems.
  • Proactively, monitor, and continuously improve SLAs, SLOs, and SLIs across critical services.
  • Develop and maintain robust observability tooling including logging, metrics, and tracing (e.g., Azure Monitor, OpenTelemetry, Prometheus).
  • Proactively conduct postmortems and root cause analysis; implement fixes to prevent repeat incidents.
  • Identify and eliminate manual operational toil through scripting and automation.
  • Design and maintain automated incident detection and response systems.
  • Establish and maintain runbooks, playbooks, and escalation protocols for system support.
  • Contribute to chaos testing and failure injection to proactively uncover weaknesses.
  • Promote a culture of operational excellence through data-driven reliability practices.
  • Proactively communicating status

Requirements
The requirements listed below are representative of the knowledge, skill, and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.

  • 5+ years of experience in Site Reliability Engineering, systems engineering, or DevOps roles.
  • Expertise in monitoring and observability platforms (e.g., Grafana, Prometheus, ELK, Azure Monitor).
  • Solid background in incident response, root cause analysis, and on-call rotations.
  • Deep knowledge of Microsoft Azure, including containerized services (AKS), networking, and storage.
  • Strong automation and scripting experience (e.g., Python, Bash, PowerShell).
  • Familiarity with IaC tools such as Terraform, Bicep, or ARM templates.
  • Experience implementing SLIs/SLOs, operational dashboards, and error budgets.
  • Comfortable designing for resiliency, failover, and graceful degradation.
  • Knowledge of compliance frameworks (e.g., SOC 2, HITRUST, IEC is a plus.
  • Strong written and verbal communication with a focus on transparency and learning.

Education And Experience

  • BS/MS in Computer Science, Engineering, or related technical field preferred.
  • 5+ years in production engineering roles with direct ownership of critical systems.
  • Microsoft certifications a plus

For roles based in the United States that require access to hospital facilities, must be eligible for and maintain credentials at all required hospitals, including meeting any applicable physical requirements or vaccination requirements (including the COVID-19 vaccine, as applicable).

ATEC is committed to providing equal employment opportunities to its employees and applicants without regard to race, color, religion, national origin, age, sex, sexual orientation, gender identity, gender expression, or any other protected status in accordance with all applicable federal, state or local laws. Further, ATEC will make reasonable accommodations that are necessary to comply with disability discrimination laws.

Salary Range
Alphatec Spine, Inc. complies with state and federal wage and hour laws and compensation depends upon candidate's qualifications, education, skill set, years of experience, and internal equity. $135,000 to $145,000 Full-Time Salary Range



  • Carlsbad, United States ATEC Spine Full time

    The Senior Site Reliability Engineer (SRE) will be responsible for ensuring the availability, performance, scalability, and operational efficiency of the Informatix cloud platform. This role is focused on reducing manual operations work (toil), automating system reliability, and ensuring production-grade observability. The ideal candidate is a...


  • Carlsbad, United States Alphatec Spine Full time

    The Senior Site Reliability Engineer (SRE) will be responsible for ensuring the availability, performance, scalability, and operational efficiency of the Informatix cloud platform. This role is focused on reducing manual operations work (toil), automating system reliability, and ensuring production-grade observability. The ideal candidate is a...


  • Berkeley, CA, United States DevOps projects Full time

    Site Reliability Engineer Have you got what it takes to succeed The following information should be read carefully by all candidates. About the Company LMArena is an engineering-first startup redefining how the world evaluates large language models. Created in 2023 by UC Berkeley researchers, our neutral, community-driven benchmarking platform attracts over...


  • San Francisco, CA, United States ConductorOne Full time

    ConductorOne is the first AI-native identity security platform that protects every identity: human, non-human, and AI. With powerful automation, platform-level AI, and out-of-the-box connectors, it centralizes access visibility, enforces fine-grained controls, enables just-in-time access, and automates user access reviews across all apps. We’re building...


  • San Francisco, CA, United States ConductorOne Full time

    We’re a hyper-creative, fast-moving team building the future of identity security. Human, non-human, and AI identity counts are exploding. ConductorOne is the answer: an AI-native platform that automates identity security at scale. We’re building something iconic, with a team of Conductors who own problems, raise the bar, and obsess about our...


  • San Francisco, CA, United States DevOps projects Full time

    Site Reliability Engineer About HappyRobot Read on to find out what you will need to succeed in this position, including skills, qualifications, and experience. HappyRobot is a platform to build and deploy AI workers that automate communication. See a demo Our AI workers connect to any system or data source to handle phone calls, email, messages… We target...


  • Carlsbad, United States Lucasfilm Full time

    Résumé du poste: The Skywalker Sound Development Group is seeking a skilled Sr System Reliability Engineer to join our team. The Skysound Development Group is developing a set of next-generation tools for audio soundtracks and media distribution. We aim, through the synthesis of institutional wisdom of creative, high-quality audio and cutting-edge software...


  • San Francisco, CA, United States Cypress HCM Full time

    Site Reliability Engineer As a Site Reliability Engineer (Contractor), you will be a hands-on contributor, focused on supporting and improving the reliability of our AWS cloud infrastructure. You will apply core SRE principles to automate operational tasks, monitor system health, and participate in incident response. This role is execution-focused,...


  • San Francisco, CA, United States Jobright.ai Full time

    Mid-Level Site Reliability/ DevOps Engineer Join to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Jobright.ai Mid-Level Site Reliability/ DevOps Engineer 2 days ago Be among the first 25 applicants Join to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Jobright.ai Jobright is an AI-powered career platform that helps job...


  • San Francisco, CA, United States DevOps projects Full time

    Site Reliability Engineer About HappyRobot HappyRobot is a platform to build and deploy AI workers that automate communication. See a demo Our AI workers connect to any system or data source to handle phone calls, email, messages We target the logistics industry which relies heavily on communication to book, check on, & pay for freight. Primarily working...