Principal Site Reliability Engineer

1 week ago


Dallas, Texas, United States CARE Full time
About Care.com

Care.com is a leading consumer tech company that's revolutionizing the way families find and connect with caregivers. Our mission is to solve the universal challenge of finding reliable care for loved ones. We're a team of entrepreneurs, self-starters, and innovators who share a passion for using technology to make a positive impact.

Job Summary

We're seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our engineering organization, you'll be responsible for ensuring the reliability, scalability, and performance of our critical systems. You'll lead incident response, manage releases, improve observability, and collaborate with cross-functional teams to drive continuous improvements.

Key Responsibilities
  • Release Management: Coordinate releases for applications, ensuring efficient deployment and smooth rollbacks.
  • Incident Response: Lead incident management, facilitate root cause analysis, and continuously update response processes.
  • Monitoring & Alerting: Implement proactive monitoring, create dashboards, and set up real-time alerts for critical services.
  • Hypercare: Ensure system stability during critical post-release periods, monitoring performance and preventing incidents.
  • Collaboration with Dev & QA: Work closely with developers and QA teams to ensure performance benchmarks and observability goals are met.
  • SLI/SLA/SLO Management: Define and measure service levels for key workflows and APIs, ensuring alignment with business expectations.
  • Observability Maturity: Continuously assess and improve observability practices across teams, driving data-driven insights.
Requirements
  • 6+ years of experience in SRE or DevOps roles with a focus on monoliths and distributed microservices in cloud environments (AWS, GCP).
  • Proficiency in CI/CD tools (Jenkins, Terraform, Ansible).
  • Strong experience with Kubernetes, Docker, and JVM-based monoliths.
  • Expertise in monitoring tools (SignalFX, Splunk, Amplitude) and production incident management.
  • Scripting skills (Python, Bash, or Groovy).
  • Strong understanding of cloud-based systems and containerization.
  • Excellent communication skills and a collaborative approach to working cross-functionally.
  • Experience optimizing large-scale, customer-facing platforms in fast-paced environments.
What We Offer

Care.com offers a competitive salary range of $180,000 to $200,000, as well as a comprehensive benefits package, including health insurance, life and disability insurance, a generous 401K employer matching program, paid holidays, and paid time off (PTO). We're an equal opportunity employer and welcome applications from diverse candidates who share our passion for using technology to make a positive impact.



  • Dallas, Texas, United States The Goldman Sachs Group Full time

    Job Title: Site Reliability EngineerAt Goldman Sachs, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability and reliability of our firm's most critical platform services.Key Responsibilities:Develop and implement automation tooling to improve the...


  • Dallas, Texas, United States The Goldman Sachs Group Full time

    Job Title: Site Reliability EngineerAt Goldman Sachs, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability and reliability of our firm's most critical platform services.Key Responsibilities:Develop and implement automation tooling to improve the...


  • Dallas, Texas, United States Glocomms Full time

    Job Title: Site Reliability EngineerGlocomms is seeking a highly skilled Site Reliability Engineer to join their team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the company's cloud infrastructure.Responsibilities:Design and implement scalable and highly available cloud infrastructureDevelop and...


  • Dallas, Texas, United States Bayone Full time

    Job Title: Site Reliability EngineerBayone is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining highly available and scalable applications deployed in Azure.Key Responsibilities:Design and implement automation tools and scripts to streamline...


  • Dallas, Texas, United States Diverse Lynx Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will play a critical role in ensuring the availability, reliability, and performance of our applications and infrastructure.Key Responsibilities:Design, implement, and maintain scalable and...


  • Dallas, Texas, United States STIAOS Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at STIAOS Technologies in Dallas, TX. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our ecommerce platform.Key Responsibilities:Collaborate with cross-functional teams to identify...


  • Dallas, Texas, United States Diverse Lynx Full time

    Job Title: Site Reliability EngineerAt Diverse Lynx LLC, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the availability, reliability, and performance of our applications and infrastructure.Key Responsibilities:Design, implement, and maintain scalable and...


  • Dallas, Texas, United States Motion Recruitment Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Motion Recruitment. As a Site Reliability Engineer, you will be responsible for ensuring the stability, scalability, and performance of our applications.About the RoleThis is a direct hire, hybrid role (3-4 days onsite) in Dallas, Texas. The...


  • Dallas, Texas, United States Motion Recruitment Full time

    Job Title: Site Reliability EngineerWe are seeking a skilled Site Reliability Engineer to join our team at Motion Recruitment. As a Site Reliability Engineer, you will be responsible for ensuring the stability, scalability, and performance of our applications.About the RoleThis is a direct hire, hybrid role (3-4 days onsite) in Dallas, Texas. The ideal...


  • Dallas, Texas, United States Themesoft Inc. Full time

    Site Reliability EngineerAt Themesoft Inc., we're seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities:Foster a culture of reliability and efficiency by sharing best...


  • Dallas, Texas, United States STIAOS Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at STIAOS Technologies in Dallas, TX. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our ecommerce systems.Key Responsibilities:Collaborate with cross-functional teams to identify and...


  • Dallas, Texas, United States Tata Consultancy Services Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Tata Consultancy Services. As an SRE Support Analyst, you will play a critical role in ensuring the stability and sustainability of our software systems.Key ResponsibilitiesDrive the stability and sustainability of our next-generation systems and discover innovative...


  • Dallas, Texas, United States STIAOS Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at STIAOS Technologies in Dallas, TX. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our software systems.Key Responsibilities:Collaborate with cross-functional teams to identify and...


  • Dallas, Texas, United States Saxon Global Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Saxon Global. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based e-commerce and retail platform.Key ResponsibilitiesDesign, develop, and maintain tools to improve the reliability,...


  • Dallas, Texas, United States The Goldman Sachs Group Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at Goldman Sachs. As a Site Reliability Engineer, you will be responsible for ensuring the availability and reliability of our firm's most critical platform services.Key ResponsibilitiesDevelop and maintain automation tooling to improve the reliability of our platform and...


  • Dallas, Texas, United States Diverse Lynx Full time

    Job DescriptionRole: Site Reliability Engineer/DevOps EngineerLocation: Dallas, TX (Onsite)Duration: Full-timeJob Description: We are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and performance of our applications...


  • Dallas, Texas, United States Avetta (formerly PICS) Full time

    Be Part of Avetta's Technical Excellence TeamAs a Site Reliability Engineer at Avetta, you will play a crucial role in optimizing and scaling our global cloud-based SaaS platform. Our focus is on maintaining highly resilient and distributed systems, integrating uptime monitors, and developing scaling algorithms to enhance end-user experience.Key...


  • Dallas, Texas, United States Net2Source Inc. Full time

    Job Title: Site Reliability Engineering ManagerNet2Source Inc. is a leading provider of total workforce solutions, recognized for its accelerated growth and commitment to delivering high-quality staffing services. As a Site Reliability Engineering Manager, you will play a critical role in ensuring the reliability and scalability of our systems, collaborating...


  • Dallas, Texas, United States Net2source Full time

    Job Title: Site Reliability Engineering ManagerAt Net2Source, we are seeking a highly skilled Site Reliability Engineering Manager to join our team. As a Site Reliability Engineering Manager, you will be responsible for leading a team of Site Reliability Engineers to ensure the reliability, scalability, and performance of our cloud-based infrastructure.Key...


  • Dallas, Texas, United States Net2Source Inc. Full time

    Job Title: Site Reliability Engineer ManagerNet2Source Inc. is a leading provider of total workforce solutions, recognized for our accelerated growth and global presence. We are seeking an experienced Site Reliability Engineer Manager to lead our SRE team and drive operational excellence.Key Responsibilities:Lead and mentor a team of Site Reliability...