Senior Site Reliability Engineer

12 hours ago


Atlanta, Georgia, United States PagerDuty Full time
About the Role

PagerDuty is seeking a highly skilled Senior Site Reliability Engineer to join our SRE-Platform team. As a key contributor, you will play a crucial role in building, maintaining, and scaling the Kubernetes platform that powers PagerDuty.

Key Responsibilities
  • Triage and troubleshoot production issues, ensuring the overall health of the platform.
  • Partner with Engineering stakeholders to design and deliver a reliable, scalable, secure, and performant platform.
  • Improve the developer experience through full lifecycle support, observability, flexible connectivity, and monitoring.
  • Share expertise with the entire Engineering organization and participate in a 24/7 on-call rotation.
Requirements
  • 5+ years of experience in Platform Engineering, Site Reliability Engineering, or DevOps roles.
  • Experience managing multiple Kubernetes clusters in a production environment.
  • Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure).
  • Experience deploying web applications on Kubernetes (Helm, ArgoCD).
  • Experience with infrastructure as code (e.g., Terraform or CloudFormation).
  • Knowledge of a dynamic language (e.g., Ruby or Python).
Preferred Qualifications
  • Experience with monitoring, observability, and logging platforms (e.g., DataDog, New Relic, SumoLogic, Splunk).
  • Knowledge of configuration management systems (e.g., Ansible, Chef, Puppet).
  • Experience in automating releases, continuous integration/delivery systems, and relevant tools (e.g., Jenkins, CircleCI, Travis CI, Buildkite).
About PagerDuty

PagerDuty is a global leader in digital operations management, revolutionizing how critical work gets done. Our Operations Cloud powers the agility that drives digital transformation, and customers rely on us to compress costs, accelerate productivity, win revenue, sustain seamless digital experiences, and earn customer trust.

We strive to build a more equitable world by investing 1% each of company equity, product, and employee volunteer time. PagerDuty is Great Place to Work-certified, a Fortune Best Workplace for Millennials, a Fortune Best Medium Workplace, a Fortune Best Workplace in Technology, and a top-rated product on TrustRadius and G2.



  • Atlanta, Georgia, United States Diversity Resource Staffing Inc Full time

    Senior Site Reliability EngineerThis is an exciting opportunity for a skilled Senior Site Reliability Engineer to join our Consumer SRE Team at IMT division, providing secure, resilient, scalable, and maintainable services for mortgage borrowers and lenders. Our client, a division of a leading financial services company, operates numerous financial and...


  • Atlanta, Georgia, United States Genesis10 Full time

    Job Title: Senior Site Reliability EngineerGenesis10 is seeking a Senior Site Reliability Engineer to join our team in Atlanta, GA. This is a 12+ month contract position.About the Role:We are looking for a highly skilled Senior Site Reliability Engineer to join our team. The successful candidate will be responsible for managing and optimizing data streaming...


  • Atlanta, Georgia, United States STORD Full time

    About StordStord is a leading commerce enablement provider of fulfillment services and technology that powers seamless checkout and delivery experiences for high-volume mid-market and enterprise brands across all channels. With a strong presence in the market, Stord manages over $5 billion of commerce annually through its fulfillment, warehousing,...


  • Atlanta, Georgia, United States PagerDuty Full time

    About the RolePagerDuty is seeking a highly skilled Senior Site Reliability Engineer to join our SRE-Platform team. As a key contributor, you will be responsible for building, maintaining, and scaling the Kubernetes platform that powers our operations.Key ResponsibilitiesMaintain the overall health of the platform, including triaging and troubleshooting...


  • Atlanta, Georgia, United States Cox Communications Full time

    About the RoleCox Automotive is seeking a highly skilled Senior Site Reliability Engineer to join our Manheim Logistics SRE team. As a key member of our team, you will be responsible for designing and maintaining AWS infrastructure and deployment pipelines for our 15+ development teams.Key ResponsibilitiesDesign and implement scalable and reliable cloud...


  • Atlanta, Georgia, United States PagerDuty Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our SRE-Platform team at PagerDuty. As a key contributor, you will be responsible for building, maintaining, and scaling our Kubernetes platform, which powers our digital operations management solutions.Key ResponsibilitiesTriage and troubleshoot production issues,...


  • Atlanta, Georgia, United States Cox Communications Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Cox Automotive. As a key member of our Manheim Logistics SRE team, you will be responsible for designing and maintaining AWS infrastructure and deployment pipelines for our 15+ development teams.Key ResponsibilitiesDesign and implement scalable and reliable...


  • Atlanta, Georgia, United States Diversity Resource Staffing Inc Full time

    Job SummaryDiversity Resource Staffing Inc is seeking a highly skilled Senior Site Reliability Engineer to join our Consumer SRE Team. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the security, resilience, scalability, and maintainability of our services for mortgage borrowers and lenders.About the RoleAs a Senior Site...


  • Atlanta, Georgia, United States PagerDuty Full time

    About the RolePagerDuty is seeking a highly skilled Senior Site Reliability Engineer to join our SRE-Platform team. As a key contributor, you will be responsible for building, maintaining, and scaling the Kubernetes platform that powers PagerDuty.Key ResponsibilitiesEnsure the overall health of the platform, including triaging and troubleshooting production...


  • Atlanta, Georgia, United States Diversity Resource Staffing Inc Full time

    Job SummaryDiversity Resource Staffing Inc is seeking a highly skilled Senior Site Reliability Engineer to join our Consumer SRE Team. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the security, resilience, scalability, and maintainability of our services for mortgage borrowers and lenders.About the RoleAs a Senior Site...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Job Title: Senior Site Reliability Engineer - Cloud ExpertJob Summary:Motion Recruitment is seeking a highly skilled Senior Site Reliability Engineer - Cloud Expert to join our client's team. As a key member of the infrastructure team, you will be responsible for designing, implementing, and maintaining scalable and highly available cloud infrastructure on...


  • Atlanta, Georgia, United States Datum Technologies Group Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Datum Technologies Group. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Implement and improve monitoring, alerting,...


  • Atlanta, Georgia, United States STORD Full time

    About StordStord is a leading commerce enablement provider of fulfillment services and technology that powers seamless checkout and delivery experiences for high-volume mid-market and enterprise brands across all channels.Job DescriptionWe are seeking a mission-driven Senior Site Reliability Engineer to be a driving force behind an exceptionally resilient,...


  • Atlanta, Georgia, United States STORD Full time

    About the RoleStord is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our cloud infrastructure team, you will be responsible for designing and implementing scalable, secure, and efficient cloud infrastructure solutions.Key ResponsibilitiesCollaborate with cross-functional teams to design and implement CI/CD...


  • Atlanta, Georgia, United States Calsoft Labs Inc. Full time

    Job Title: Site Reliability EngineerJob Summary:We are seeking a highly skilled Site Reliability Engineer to join our team at Calsoft Labs Inc. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities:Design and develop scalable and reliable...


  • Atlanta, Georgia, United States Disability Solutions Full time

    Job Title: Sr Engineer, Site ReliabilityAt T-Mobile, we're committed to empowering our employees to drive innovation and excellence. As a Sr Engineer, Site Reliability, you'll play a critical role in ensuring the reliability and scalability of our IT services.Key Responsibilities:Design and implement scalable and reliable software systems, leveraging cloud...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Location: Atlanta, Georgia Employment Type: Hybrid, Direct Hire Salary: $150k - $170k A prominent organization within the financial services sector is seeking a talented individual to enhance their team. They are on the lookout for a Senior Site Reliability Engineer to contribute full-time in their Atlanta office, specifically on-site Monday, Tuesday, and...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team in Atlanta, GA. As a key member of our infrastructure team, you will be responsible for ensuring the reliability and scalability of our cloud-based platform.About the RoleThis is a full-time position that requires a minimum of 8 years of...


  • Atlanta, Georgia, United States Geotab Full time

    About GeotabGeotab is a global leader in IoT and connected transportation, certified as a "Great Place to WorkTM." We're a company of diverse and talented individuals who work together to help businesses grow and succeed, and increase the safety and sustainability of our communities.Job SummaryWe're seeking a Site Reliability Engineer to provide escalated...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Job Title: Senior Site Reliability Engineer IIAt Motion Recruitment, we are seeking a highly skilled Senior Site Reliability Engineer II to join our team. As a key member of our SRE/Platform team, you will be responsible for ensuring the reliability and scalability of our SaaS-based AI/ML product.About the Role:Work closely with the SRE/Platform team to...