Cloud Site Reliability Engineer

1 week ago


atlanta, United States Tata Consultancy Services Full time

Cloud Site Reliability Engineer

Work Authorization: USC , GC ,GC EAD ONLY

Roles & Responsibilities

Role: Cloud Site Reliability Engineer (SRE)

  • Minimum 5+ years of hands-on experience supporting Kubernetes /Openshift / RKE / EKS Container platform.
  • Experience with Python, Ansible, Golang, and shell scripting.
  • Kubernetes /Openshift /Terraform certifications are a plus.
  • Strong experience in major services related to Compute, Storage, Network and Security.
  • Experience with monitoring tools like Prometheus and Dynatrace, as well as cloud native tools like Azure Monitor and Log Analytics.
  • Strong understanding and background of working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and Ping Identity or other SSO solutions.
  • Advanced knowledge of Linux OS, DNS, DHCP, Kerberos and Windows Authentication.
  • Experience with CI/CD tools git /Jenkins, GitOps model.
  • Excellent understanding of Linux /Windows operating systems administration.
  • Experience in Container security and vulnerability remediation.
  • Systematic problem-solving approach, sense of ownership and drive.
  • Ability to juggle competing priorities and adapt to changes in project scope.
  • Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
  • Proven ability to work independently with minimal supervision and as part of a team with direct responsibilities.
  • Responsible for reliability and support of Container Platform on-prem and external clouds (Azure /AWS /Google).
  • Monitor and troubleshoot Container platform (Openshift), Rancher (RKE) and Azure (AKS) environment performance issues, connectivity issues, security issues, etc.
  • Perform deep dives into systemic and latent reliability issues, Incident management, problem management.
  • Identifying, analyzing, and resolving infrastructure vulnerabilities and application deployment issues.
  • Perform blameless RCA, partner with engineering and operation teams across the organization to roll out fixes.
  • Responsible for application onboarding and provide troubleshooting support through the lifecycle of the applications on the container platform.
  • Identify and drive opportunities to improve automation to reduce TOIL and improve operational excellence.
  • Partner with risk, and compliance teams to bring visibility and implement right controls and remediation of vulnerabilities.
  • Ensure resiliency during implementation and identify/fix resiliency problems by collaborating with engineering teams.
  • Be a key stakeholder in the design of cloud services and work with Architecture, engineering, product teams.
  • Participate in 24x7 on-call coverage follow the sun model.



  • Atlanta, United States Tata Consultancy Services Full time

    Cloud Site Reliability Engineer Work Authorization: USC , GC ,GC EAD ONLYRoles & ResponsibilitiesRole: Cloud Site Reliability Engineer (SRE)Minimum 5+ years of hands-on experience supporting Kubernetes /Openshift / RKE / EKS Container platform.Experience with Python, Ansible, Golang, and shell scripting.Kubernetes /Openshift /Terraform certifications are a...


  • atlanta, United States Tata Consultancy Services Full time

    Cloud Site Reliability Engineer Work Authorization: USC , GC ,GC EAD ONLYRoles & ResponsibilitiesRole: Cloud Site Reliability Engineer (SRE)Minimum 5+ years of hands-on experience supporting Kubernetes /Openshift / RKE / EKS Container platform.Experience with Python, Ansible, Golang, and shell scripting.Kubernetes /Openshift /Terraform certifications are a...


  • Atlanta, Georgia, United States Motion Recruitment Full time

    Job Title: Site Reliability Engineer - Azure Cloud ExpertAbout the Role: We are seeking a highly skilled Site Reliability Engineer to join our team in Atlanta. As a Site Reliability Engineer, you will be responsible for ensuring the scalability and reliability of our ecommerce applications on Azure cloud.Key Responsibilities:* Proactively monitor and...


  • Atlanta, Georgia, United States Now100 Full time

    Job Title: Site Reliability Engineer - Cloud Infrastructure SpecialistCompany Overview: Now100 is a leading provider of technology solutions, committed to delivering exceptional results for our clients. We match thoroughly vetted resources to contract, contract-to-hire, and permanent positions in all industries.Job Description: We are seeking a highly...


  • Atlanta, Georgia, United States Ditto Job Board Full time

    Job Title: Site Reliability EngineerAt Ditto, we're on a mission to unleash the full power of edge devices by removing all the plumbing required to build amazing applications. As a Site Reliability Engineer, you'll play a critical role in helping us achieve this goal.About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our Federal...


  • Atlanta, Georgia, United States Navtech Full time

    Job Title: Site Reliability EngineerJob Description:We are seeking a highly skilled Site Reliability Engineer to join our team at Navtech. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and performance of our production systems.Key Responsibilities:Provide L4 technical support for production 24x7Design and...


  • Atlanta, United States Softworld, a Kelly Company Full time

    The Cloud Site Reliability Engineer (SRE) works closely with cloud development team, IT operations team and business partners to streamline and implement enhanced monitoring and alerting capability across infrastructure, application layers. By leveraging automation tools, SREs address and resolve issues, minimizing manual workload and enhancing system...


  • Atlanta, United States Softworld, a Kelly Company Full time

    The Cloud Site Reliability Engineer (SRE) works closely with cloud development team, IT operations team and business partners to streamline and implement enhanced monitoring and alerting capability across infrastructure, application layers. By leveraging automation tools, SREs address and resolve issues, minimizing manual workload and enhancing system...


  • atlanta, United States Softworld, a Kelly Company Full time

    The Cloud Site Reliability Engineer (SRE) works closely with cloud development team, IT operations team and business partners to streamline and implement enhanced monitoring and alerting capability across infrastructure, application layers. By leveraging automation tools, SREs address and resolve issues, minimizing manual workload and enhancing system...


  • Atlanta, Georgia, United States Della Infotech Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Della Infotech. As a key member of our DevOps team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using AWS...


  • Atlanta, Georgia, United States Jonas Software UK Full time

    About the Role:We are seeking a highly skilled Senior Site Reliability Engineer to join our team at Jonas Software UK. As a key member of our technical operations team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly...


  • Atlanta, Georgia, United States IRIS Consulting Corporation Full time

    Job DescriptionWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at IRIS Consulting Corporation. As a key member of our Retail, Site Reliability Engineering team, you will be responsible for establishing and maintaining the reliability of our cloud-based infrastructure and applications.Key Responsibilities:Design and implement...


  • Atlanta, United States Datum Technologies Group Full time

    Opening for SRE – Atlanta GA- Hybrid . Site Reliability Engineer Long term contract Atlanta, GA Qualifications:Deep understanding of AWS services (Lambda, S3, SQS, IAM, Route 53 etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).Hands-on experience with monitoring tools such as CloudWatch, Sumo Logic, Dynatrace, Grafana,...


  • Atlanta, United States Datum Technologies Group Full time

    Opening for SRE – Atlanta GA- Hybrid . Site Reliability Engineer Long term contract Atlanta, GA Qualifications:Deep understanding of AWS services (Lambda, S3, SQS, IAM, Route 53 etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).Hands-on experience with monitoring tools such as CloudWatch, Sumo Logic, Dynatrace, Grafana,...


  • atlanta, United States Datum Technologies Group Full time

    Opening for SRE – Atlanta GA- Hybrid . Site Reliability Engineer Long term contract Atlanta, GA Qualifications:Deep understanding of AWS services (Lambda, S3, SQS, IAM, Route 53 etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).Hands-on experience with monitoring tools such as CloudWatch, Sumo Logic, Dynatrace, Grafana,...


  • atlanta, United States Datum Technologies Group Full time

    Opening for SRE – Atlanta GA- Hybrid . Site Reliability Engineer Long term contract Atlanta, GA Qualifications:Deep understanding of AWS services (Lambda, S3, SQS, IAM, Route 53 etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).Hands-on experience with monitoring tools such as CloudWatch, Sumo Logic, Dynatrace, Grafana,...


  • Atlanta, Georgia, United States Kobiton Full time

    About the RoleKobiton is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and scalability of our systems and services.You will work closely with development and operations teams to build and maintain robust infrastructure, automate...


  • Atlanta, Georgia, United States Microsoft Corporation Full time

    We are seeking a highly skilled Senior Site Reliability Engineer to join our Windows Servicing and Delivery team at Microsoft Corporation.The ideal candidate will have a strong background in software engineering, network engineering, or systems administration, with a proven track record of delivering high-quality solutions that meet customer needs.As a...


  • Atlanta, Georgia, United States STORD Full time

    About the RoleStord is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you will be responsible for designing and implementing scalable, efficient, and secure infrastructure and platform solutions.You will collaborate with cross-functional teams to deliver high-quality products and services to our...


  • Atlanta, Georgia, United States SIDEARM Sports Full time

    Job SummaryAt SIDEARM Sports, we're seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you'll play a critical role in ensuring the reliability, availability, and performance of our live services, which impact millions of customers across the entertainment space.Key ResponsibilitiesCollaborate with...