Site Reliability Engineer
1 month ago
The Cloud Site Reliability Engineer (SRE) works closely with cloud development team, IT operations team and business partners to streamline and implement enhanced monitoring and alerting capability across infrastructure, application layers. By leveraging automation tools, SREs address and resolve issues, minimizing manual workload and enhancing system scalability and reliability. Their core focus lies in standardization and automation to build and run fault-tolerant systems. Typically, SREs possess a background in software engineering, system engineering, or system administration, coupled with substantial IT operations experience. SREs oversee availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.
- Writing and developing code to automate processes, such as analyzing logs, testing production environments and responding to any issues?
- Collaborates with agile teams and business partners to develop specifications that resolve problems and enhancement needs, including focusing on monitoring, and metrics for operational readiness
- Identify bottlenecks in development and deployment processes and designs automation solutions to mitigate?
- Develop new capabilities in displaying/monitoring/alerting on key performance indicators by tracking business transactions in real-time
- Maintain and grow knowledge of platform configuration management, monitoring of established metrics, and troubleshooting ?
- Provides continuous feedback to development teams on system stability, defect analysis, and system enhancements ?
- Design and develop alert escalation and incident response automation?
- Provide production support for cloud service outages and incidents and work on both tactical and strategic plans for outage prevention?
- Provide feedback on resiliency and maintainability of solutions to Cloud and App architects?
- Conduct disaster recovery scenario generation and testing?
- Implement sustainable, audit-ready processes that support information technology controls, including deployment execution, access management, audits, incident management and related requirements.
Must-have technical skills:
- Should have at least 3 years’ experience as a site reliability engineer on a cross functional agile team working in Azure.
- Have working knowledge of agile development methodologies (scrum, sprints, KanBan etc.) and tools (Azure DevOps etc.)
- Have at least 3 years hands-on experience using IaC tools Terraform, Github, Ansible and Packer
- Proven experience across testing, integration, source code management, deployment and containerization
- Sound problem-solving skills with the ability to quickly process complex information and present it clearly and simply?
- Experience with cloud technologies and services including those for Compute, Storage, Databases and API Management
- On-premise to cloud migration experience
-
Site Reliability Engineer
4 weeks ago
Atlanta, Georgia, United States JobRialto Full timeJob SummaryThe Site Reliability Engineer is responsible for ensuring the availability, scalability, and performance of critical services and systems. This role requires expertise in OpenShift and CloudFormation, along with a deep understanding of site reliability principles, container technologies, monitoring tools, and automation.Key ResponsibilitiesEnsure...
-
Site Reliability Engineer
4 weeks ago
Atlanta, Georgia, United States Navtech Full timeJob Title: Site Reliability EngineerJob Description:We are seeking a highly skilled Site Reliability Engineer to join our team at Navtech. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and performance of our production systems.Key Responsibilities:Provide L4 technical support for production 24x7Design and...
-
Senior Site Reliability Engineer
4 weeks ago
Atlanta, Georgia, United States Jonas Software UK Full timeAbout the Role:We are seeking a highly skilled Senior Site Reliability Engineer to join our team at Jonas Software UK. As a key member of our technical operations team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly...
-
Site Reliability Engineer
1 month ago
Atlanta, Georgia, United States Kobiton Full timeAbout the RoleKobiton is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and scalability of our systems and services.You will work closely with development and operations teams to build and maintain robust infrastructure, automate...
-
Site Reliability Engineer
1 month ago
atlanta, United States Softworld, a Kelly Company Full timeThe Cloud Site Reliability Engineer (SRE) works closely with cloud development team, IT operations team and business partners to streamline and implement enhanced monitoring and alerting capability across infrastructure, application layers. By leveraging automation tools, SREs address and resolve issues, minimizing manual workload and enhancing system...
-
Senior Site Reliability Engineer
4 weeks ago
Atlanta, Georgia, United States Microsoft Corporation Full timeWe are seeking a highly skilled Senior Site Reliability Engineer to join our Windows Servicing and Delivery team at Microsoft Corporation.The ideal candidate will have a strong background in software engineering, network engineering, or systems administration, with a proven track record of delivering high-quality solutions that meet customer needs.As a...
-
Site Reliability Engineer
2 weeks ago
atlanta, United States Datum Technologies Group Full timeOpening for SRE – Atlanta GA- Hybrid . Site Reliability Engineer Long term contract Atlanta, GA Qualifications:Deep understanding of AWS services (Lambda, S3, SQS, IAM, Route 53 etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).Hands-on experience with monitoring tools such as CloudWatch, Sumo Logic, Dynatrace, Grafana,...
-
Site Reliability Engineer
2 weeks ago
Atlanta, United States Datum Technologies Group Full timeOpening for SRE – Atlanta GA- Hybrid . Site Reliability Engineer Long term contract Atlanta, GA Qualifications:Deep understanding of AWS services (Lambda, S3, SQS, IAM, Route 53 etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).Hands-on experience with monitoring tools such as CloudWatch, Sumo Logic, Dynatrace, Grafana,...
-
Site Reliability Engineer
2 weeks ago
Atlanta, United States Datum Technologies Group Full timeOpening for SRE – Atlanta GA- Hybrid . Site Reliability Engineer Long term contract Atlanta, GA Qualifications:Deep understanding of AWS services (Lambda, S3, SQS, IAM, Route 53 etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).Hands-on experience with monitoring tools such as CloudWatch, Sumo Logic, Dynatrace, Grafana,...
-
Site Reliability Engineer
2 weeks ago
atlanta, United States Datum Technologies Group Full timeOpening for SRE – Atlanta GA- Hybrid . Site Reliability Engineer Long term contract Atlanta, GA Qualifications:Deep understanding of AWS services (Lambda, S3, SQS, IAM, Route 53 etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).Hands-on experience with monitoring tools such as CloudWatch, Sumo Logic, Dynatrace, Grafana,...
-
Cloud Site Reliability Engineer
2 weeks ago
atlanta, United States Tata Consultancy Services Full timeCloud Site Reliability Engineer Work Authorization: USC , GC ,GC EAD ONLYRoles & ResponsibilitiesRole: Cloud Site Reliability Engineer (SRE)Minimum 5+ years of hands-on experience supporting Kubernetes /Openshift / RKE / EKS Container platform.Experience with Python, Ansible, Golang, and shell scripting.Kubernetes /Openshift /Terraform certifications are a...
-
Cloud Site Reliability Engineer
1 week ago
atlanta, United States Tata Consultancy Services Full timeCloud Site Reliability Engineer Work Authorization: USC , GC ,GC EAD ONLYRoles & ResponsibilitiesRole: Cloud Site Reliability Engineer (SRE)Minimum 5+ years of hands-on experience supporting Kubernetes /Openshift / RKE / EKS Container platform.Experience with Python, Ansible, Golang, and shell scripting.Kubernetes /Openshift /Terraform certifications are a...
-
Cloud Site Reliability Engineer
2 weeks ago
Atlanta, United States Tata Consultancy Services Full timeCloud Site Reliability Engineer Work Authorization: USC , GC ,GC EAD ONLYRoles & ResponsibilitiesRole: Cloud Site Reliability Engineer (SRE)Minimum 5+ years of hands-on experience supporting Kubernetes /Openshift / RKE / EKS Container platform.Experience with Python, Ansible, Golang, and shell scripting.Kubernetes /Openshift /Terraform certifications are a...
-
Site Reliability Engineer
4 weeks ago
Atlanta, Georgia, United States Now100 Full timeJob Title: Site Reliability Engineer - Cloud Infrastructure SpecialistCompany Overview: Now100 is a leading provider of technology solutions, committed to delivering exceptional results for our clients. We match thoroughly vetted resources to contract, contract-to-hire, and permanent positions in all industries.Job Description: We are seeking a highly...
-
Senior Site Reliability Engineering Manager
4 weeks ago
Atlanta, Georgia, United States Microsoft Corporation Full timeAbout the RoleMicrosoft Corporation is seeking a highly skilled Senior Site Reliability Engineering Manager to lead the delivery of critical features in Office 365 government cloud offerings. As a key member of the Office 365 team, you will be responsible for combining your passion for quality, reliability, and creativity to drive evolution in the continuous...
-
Site Reliability Engineer Atlanta GA On Site
1 month ago
Atlanta, Georgia, United States Motion Recruitment Full timeExciting Opportunity in Atlanta, GAMotion Recruitment is seeking a highly skilled Site Reliability Engineer (SRE) to join our team in Atlanta, GA. This is an on-site position that requires a strong background in software solutions and a passion for ensuring system reliability and performance.About the CompanyOur client specializes in providing cutting-edge...
-
Senior Site Reliability Engineer
4 weeks ago
Atlanta, Georgia, United States Pyramid Consulting Full timeJob SummaryWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Pyramid Consulting, Inc. This is a contract opportunity with long-term potential and is located in Atlanta, GA.Key ResponsibilitiesDesign and implement SLOs / SLIs / error budgets and manage reliability for infrastructure and applicationsProven experience with...
-
Senior Site Reliability Engineer
4 weeks ago
Atlanta, Georgia, United States Pyramid Consulting Full timePyramid Consulting is seeking a talented Senior Site Reliability Engineer to join our team. This is a contract opportunity with long-term potential and is located in a major US city. The successful candidate will have a strong background in setting SLOs / SLIs / error budgets and managing reliability for infrastructure and applications.Key...
-
Site Reliability Engineer
4 weeks ago
Atlanta, Georgia, United States Jobs for Humanity Full timeAbout the Role:FIS is seeking a Site Reliability Engineer to join our innovative Platform Service Delivery team. As a key member of our team, you will be responsible for ensuring the high stability, reduced Service Downtime, and improved Quality of Service for FIS clients.Key Responsibilities:Participate in day-to-day activities of operating the payment...
-
Site Reliability Engineer
4 weeks ago
Atlanta, Georgia, United States Motion Recruitment Full timeExciting opportunity in software solutions for fraud detection. This company specializes in helping businesses protect against financial threats. They are looking for a Site Reliability Engineer (SRE) to set up and maintain an on-premises environment. The role involves working with technologies like EKS and DataDog to ensure system reliability and...