Senior Site Reliability Engineer 4

2 days ago


Atlanta, Georgia, United States PagerDuty Full time
About the Role

PagerDuty is seeking a highly skilled Senior Site Reliability Engineer to join our SRE-Platform team. As a key contributor, you will be responsible for building, maintaining, and scaling the Kubernetes platform that powers PagerDuty.

Key Responsibilities
  • Ensure the overall health of the platform, including triaging and troubleshooting production issues, monitoring system capacity, and collaborating with other technical teams to ensure compliance and security best practices.
  • Partner with Engineering stakeholders to design and deliver a reliable, scalable, secure, and performant platform.
  • Continuously strive to improve the developer experience, including full lifecycle support, observability, flexible connectivity, and monitoring.
  • Share expertise with the entire Engineering organization.
  • Participate in a 24/7 on-call rotation, utilizing PagerDuty to manage on-call schedules.
Requirements
  • 5+ years of experience in Platform Engineering, Site Reliability Engineering, or DevOps roles.
  • Experience managing multiple Kubernetes clusters in a production environment.
  • Experience working on cloud-native infrastructure (e.g., AWS, GCP, Azure).
  • Experience deploying web applications on Kubernetes (Helm, ArgoCD).
  • Experience with infrastructure as code (e.g., Terraform or CloudFormation).
  • Knowledge of a dynamic language (e.g., Ruby or Python).
Preferred Qualifications
  • Experience with monitoring, observability, and logging platforms (e.g., DataDog, New Relic, SumoLogic, Splunk).
  • Knowledge of configuration management systems (e.g., Ansible, Chef, Puppet).
  • Experience in automating releases, continuous integration/delivery systems, and relevant tools (e.g., Jenkins, CircleCI, Travis CI, Buildkite).
About PagerDuty

PagerDuty is a global leader in digital operations management, revolutionizing how critical work gets done. Our Operations Cloud powers the agility that drives digital transformation, and customers rely on us to compress costs, accelerate productivity, win revenue, sustain seamless digital experiences, and earn customer trust.

We are committed to creating a diverse environment and are an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, parental status, veteran status, or disability status.



  • Atlanta, Georgia, United States PagerDuty Full time

    About the RolePagerDuty is seeking a highly skilled Senior Site Reliability Engineer to join our SRE-Platform team. As a key contributor, you will be responsible for building, maintaining, and scaling the Kubernetes platform that powers our operations.Key ResponsibilitiesMaintain the overall health of the platform, including triaging and troubleshooting...


  • Atlanta, Georgia, United States Cox Communications Full time

    About the RoleCox Automotive is seeking a highly skilled Senior Site Reliability Engineer to join our Manheim Logistics SRE team. As a key member of our team, you will be responsible for designing and maintaining AWS infrastructure and deployment pipelines for our 15+ development teams.Key ResponsibilitiesDesign and implement scalable and reliable cloud...


  • Atlanta, Georgia, United States Cox Communications Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Cox Automotive. As a key member of our Manheim Logistics SRE team, you will be responsible for designing and maintaining AWS infrastructure and deployment pipelines for our 15+ development teams.Key ResponsibilitiesDesign and implement scalable and reliable...


  • Atlanta, Georgia, United States Diversity Resource Staffing Inc Full time

    Senior Site Reliability EngineerThis is an exciting opportunity for a skilled Senior Site Reliability Engineer to join our Consumer SRE Team at IMT division, providing secure, resilient, scalable, and maintainable services for mortgage borrowers and lenders. Our client, a division of a leading financial services company, operates numerous financial and...


  • Atlanta, Georgia, United States PagerDuty Full time

    PagerDuty empowers teams of all kinds to do the critical work that moves business forward through the PagerDuty Operations Cloud.Visit our careers site to explore life at PagerDuty, discover opportunities, and sign-up for job alertsPagerDuty is seeking a Senior Site Reliability Engineer to join our SRE-Platform team. In this role you will be a key...


  • Atlanta, Georgia, United States STORD Full time

    About StordStord is a leading commerce enablement provider of fulfillment services and technology that powers seamless checkout and delivery experiences for high-volume mid-market and enterprise brands across all channels. With a strong presence in the market, Stord manages over $5 billion of commerce annually through its fulfillment, warehousing,...


  • Atlanta, Georgia, United States Diversity Resource Staffing Inc Full time

    Job SummaryDiversity Resource Staffing Inc is seeking a highly skilled Senior Site Reliability Engineer to join our Consumer SRE Team. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the security, resilience, scalability, and maintainability of our services for mortgage borrowers and lenders.About the RoleAs a Senior Site...


  • Atlanta, Georgia, United States Diversity Resource Staffing Inc Full time

    Job SummaryDiversity Resource Staffing Inc is seeking a highly skilled Senior Site Reliability Engineer to join our Consumer SRE Team. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the security, resilience, scalability, and maintainability of our services for mortgage borrowers and lenders.About the RoleAs a Senior Site...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Job Title: Senior Site Reliability Engineer - Cloud ExpertJob Summary:Motion Recruitment is seeking a highly skilled Senior Site Reliability Engineer - Cloud Expert to join our client's team. As a key member of the infrastructure team, you will be responsible for designing, implementing, and maintaining scalable and highly available cloud infrastructure on...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Location: Atlanta, Georgia Employment Type: Hybrid, Direct Hire Salary: $150k - $170k A prominent organization within the financial services sector is seeking a talented individual to enhance their team. They are on the lookout for a Senior Site Reliability Engineer to contribute full-time in their Atlanta office, specifically on-site Monday, Tuesday, and...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Job Title: Senior Site Reliability Engineer IIAt Motion Recruitment, we are seeking a highly skilled Senior Site Reliability Engineer II to join our team. As a key member of our SRE/Platform team, you will be responsible for ensuring the reliability and scalability of our SaaS-based AI/ML product.About the Role:Work closely with the SRE/Platform team to...


  • Atlanta, Georgia, United States SUCCESS KOREA Full time

    Position Title:Reliability Engineer (Senior Level)Company Overview:SUCCESS KOREA - A leading chemical manufacturing firm.Job Responsibilities:1. Oversee process management and maintenance operations within the chemical sector.2. Utilize expertise in Chemical Engineering or Mechanical Engineering to enhance operational efficiency.3. Demonstrate strong...


  • Atlanta, Georgia, United States Advansys Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Advansys. As a key member of our infrastructure team, you will be responsible for maintaining and improving the reliability, performance, and availability of our software systems.Key Responsibilities:Maintain and improve the reliability, performance, and availability...


  • Atlanta, Georgia, United States Geotab Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Geotab. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based infrastructure. You will work closely with our development team to design, implement, and maintain our cloud infrastructure, ensuring...


  • Atlanta, Georgia, United States Ultimate Software Full time

    About the RoleAs a Site Reliability Engineer at Ultimate Kronos Group, you will play a critical role in ensuring the reliability and efficiency of our cloud-based services. You will be responsible for developing software solutions to enhance, harden, and support our service delivery processes. This includes automated testing, performance analysis,...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Job Title: Senior Site Reliability EngineerJob Type: Full-timeLocation: Atlanta, GeorgiaJob Description:A leading healthcare and software company in Atlanta, Georgia, is seeking a highly skilled Senior Site Reliability Engineer to join its team. The company specializes in cancer treatments and best practices for chemotherapy, aiming to provide the most...


  • Atlanta, Georgia, United States Cox Enterprises Full time

    This Software Engineer will be part of the Site Reliability Engineering (SRE) team. The SRE team is an innovative team devoted to providing automated solutions and services for Cox Automotive to measure, evaluate and plan for visible, reliable application delivery and maintenance. As a member of the SRE team, you will work with development teams to help...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Location: Atlanta, GeorgiaEmployment Type: Hybrid, Full TimeSalary: $150k - $170kAbout the Role:A prominent organization in the healthcare and technology sector is seeking to enhance its team in Atlanta, Georgia. This firm specializes in cancer treatment solutions and implements optimal strategies for chemotherapy tailored to various treatment...


  • Atlanta, Georgia, United States Ultimate Software Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Ultimate Software. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and efficiency of our cloud-based services.Key ResponsibilitiesDesign and implement scalable and reliable cloud infrastructure solutionsDevelop and maintain...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Location: Atlanta, Georgia Type: Hybrid, Direct Hire Salary: $150k - $170k A prominent player in the financial services sector is seeking a dedicated Senior Site Reliability Engineer to enhance their team. This organization prides itself on safeguarding numerous corporate environments against fraudulent activities. As a frontrunner in their field, they have...