Site Reliability Engineer
3 months ago
The Cloud Site Reliability Engineer (SRE) works closely with cloud development team, IT operations team and business partners to streamline and implement enhanced monitoring and alerting capability across infrastructure, application layers. By leveraging automation tools, SREs address and resolve issues, minimizing manual workload and enhancing system scalability and reliability. Their core focus lies in standardization and automation to build and run fault-tolerant systems. Typically, SREs possess a background in software engineering, system engineering, or system administration, coupled with substantial IT operations experience. SREs oversee availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.
- Writing and developing code to automate processes, such as analyzing logs, testing production environments and responding to any issues?
- Collaborates with agile teams and business partners to develop specifications that resolve problems and enhancement needs, including focusing on monitoring, and metrics for operational readiness
- Identify bottlenecks in development and deployment processes and designs automation solutions to mitigate?
- Develop new capabilities in displaying/monitoring/alerting on key performance indicators by tracking business transactions in real-time
- Maintain and grow knowledge of platform configuration management, monitoring of established metrics, and troubleshooting ?
- Provides continuous feedback to development teams on system stability, defect analysis, and system enhancements ?
- Design and develop alert escalation and incident response automation?
- Provide production support for cloud service outages and incidents and work on both tactical and strategic plans for outage prevention?
- Provide feedback on resiliency and maintainability of solutions to Cloud and App architects?
- Conduct disaster recovery scenario generation and testing?
- Implement sustainable, audit-ready processes that support information technology controls, including deployment execution, access management, audits, incident management and related requirements.
Must-have technical skills:
- Should have at least 3 years’ experience as a site reliability engineer on a cross functional agile team working in Azure.
- Have working knowledge of agile development methodologies (scrum, sprints, KanBan etc.) and tools (Azure DevOps etc.)
- Have at least 3 years hands-on experience using IaC tools Terraform, Github, Ansible and Packer
- Proven experience across testing, integration, source code management, deployment and containerization
- Sound problem-solving skills with the ability to quickly process complex information and present it clearly and simply?
- Experience with cloud technologies and services including those for Compute, Storage, Databases and API Management
- On-premise to cloud migration experience
-
Reliability Engineer
5 days ago
Atlanta, Georgia, United States Allied Reliability Full timeAbout the Position">As a Reliability Engineer - Electrical at Allied Reliability, you will play a key role in developing and implementing strategies to improve the reliability and efficiency of our equipment and systems. This position requires a high level of technical expertise, as well as excellent analytical and problem-solving skills.">Key...
-
Site Reliability Engineer
4 days ago
Atlanta, United States Softworld, a Kelly Company Full timeThe Cloud Site Reliability Engineer (SRE) works closely with cloud development team, IT operations team and business partners to streamline and implement enhanced monitoring and alerting capability across infrastructure, application layers. By leveraging automation tools, SREs address and resolve issues, minimizing manual workload and enhancing system...
-
Site Reliability Engineer
3 weeks ago
Atlanta, United States Motion Recruitment Full timeJob Title: Automation Engineer - Cloud and ReliabilityJob Responsibilities:Develop scripts to automate processes and reduce toil and failures.Monitor the health of applications, batch processes, and data feeds.Set up monitoring systems and develop dashboards for performance tracking.Lead and triage major incidents, investigating and troubleshooting...
-
Site Reliability Engineer
2 weeks ago
Atlanta, United States Motion Recruitment Full timeJob Title: Automation Engineer - Cloud and ReliabilityJob Responsibilities:Develop scripts to automate processes and reduce toil and failures.Monitor the health of applications, batch processes, and data feeds.Set up monitoring systems and develop dashboards for performance tracking.Lead and triage major incidents, investigating and troubleshooting...
-
Atlanta, Georgia, United States Engle Martin & Associates Full timeAbout the JobEngle Martin & Associates is seeking an experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our systems. You will work closely with development teams to design, deploy, and maintain scalable and reliable systems using modern...
-
Site Reliability Engineering Lead
5 days ago
Atlanta, Georgia, United States Softworld, a Kelly Company Full timeJob Description">The Cloud Site Reliability Engineer works closely with cloud development team, IT operations team, and business partners to streamline and implement enhanced monitoring and alerting capability across infrastructure, application layers.">Responsibilities">Writing and developing code to automate processes, such as analyzing logs, testing...
-
Site Reliability Engineer
7 days ago
Atlanta, Georgia, United States Resource Informatics Group Inc Full timeJob Overview As a Site Reliability Engineer at Resource Informatics Group Inc, you will be part of a team devoted to providing automated solutions and services for Cox Automotive. Your mission will be to measure, evaluate, and plan for visible, reliable application delivery and maintenance. We are looking for engineers who are passionate about...
-
Senior Site Reliability Engineer
3 weeks ago
Atlanta, United States Cox Automotive Full timeCox Automotive is looking for a Senior Site Reliability Engineer (SRE) to join our Manheim Logistics SRE team . The SRE team is tasked with designing and maintaining AWS infrastructure and deployment pipelines for Manheim Logistics' 15+ development teams. The team has currently standardized on a Docker-based infrastructure solution and is adding...
-
Site Reliability Engineer
5 days ago
Atlanta, United States Canonical Full timeJob DescriptionJob DescriptionCanonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers include the world's leading...
-
Site Reliability Engineer
2 weeks ago
Atlanta, United States Motion Recruitment Full timeONLY W2 - ONLY W2Required Skills & ExperienceManage and optimize data streaming and API components in OpenShift On premise and AWS.Proactively review the application’s APIs and processes to identify opportunities to optimize the response times for various application components.Automate various types of testing including data quality checks, automate...
-
Site Reliability Engineer
4 weeks ago
Atlanta, United States Motion Recruitment Full timeONLY W2 - ONLY W2Required Skills & ExperienceManage and optimize data streaming and API components in OpenShift On premise and AWS.Proactively review the application’s APIs and processes to identify opportunities to optimize the response times for various application components.Automate various types of testing including data quality checks, automate...
-
Site Reliability Engineer
5 days ago
Atlanta, United States Disability Solutions Full timePosition Type : Full time Type Of Hire : Experienced (relevant combo of work and education) Education Desired : Bachelor's Degree Travel Percentage : 0%Job DescriptionAre you curious, motivated, and forward-thinking? At FIS you’ll have the opportunity to work on some of the most challenging and relevant issues in financial services and technology. Our...
-
Site Reliability Specialist
4 days ago
Atlanta, Georgia, United States Disability Solutions Full timeJob OverviewFIS is a leading provider of disability solutions, and we are seeking a skilled Site Reliability Specialist to join our team.Job DescriptionWe are looking for an experienced professional who can participate in all day-to-day activities of operating the payment infrastructure to maintain high stability, reduce service downtime, and improve quality...
-
Siter Reliability Engineer
7 days ago
Atlanta, United States ACL Digital Full timeTitle: Site Reliability Engineer Work Location: Atlanta, GA Duration: 12 months Site Reliability Engineer (SRE) with AWS Cloud and Application Monitoring Experience We are seeking a skilled Site Reliability Engineer (SRE) with expertise in AWS cloud infrastructure and robust application monitoring capabilities. As an integral part of our team, you...
-
Site Reliability Expert
2 days ago
Atlanta, Georgia, United States Inabia Software & Consulting Inc. Full timeAbout the Position:We are looking for a highly skilled Site Reliability Engineer to join our team at Inabia Software & Consulting Inc. As a key member of our engineering team, you will be responsible for designing, building, and operating large-scale distributed systems.Main Responsibilities:Kubernetes Cluster Management: Design, deploy, and manage...
-
Site Reliability Engineering Professional
6 days ago
Atlanta, Georgia, United States RIT Solutions, Inc. Full timeResponsibilities and QualificationsThe DevOps Engineer will be responsible for ensuring the reliability, scalability, and performance of our cloud-hosted applications. This role requires a strong understanding of DevOps practices, including CI/CD pipelines and automation scripts. The ideal candidate will have experience working with Kubernetes, AWS EKS, and...
-
Motion Recruitment | Site Reliability Engineer
4 weeks ago
atlanta, United States Motion Recruitment Full timeONLY W2 - ONLY W2Required Skills & ExperienceManage and optimize data streaming and API components in OpenShift On premise and AWS.Proactively review the application’s APIs and processes to identify opportunities to optimize the response times for various application components.Automate various types of testing including data quality checks, automate...
-
Plant Electrical Engineer
5 days ago
Atlanta, Georgia, United States Allied Reliability Full timeJob SummaryWe are seeking a Plant Electrical Engineer to join our maintenance team at Allied Reliability. As an integral part of our operations, you will play a critical role in ensuring the smooth operation of our plant's machinery and equipment.This is an excellent opportunity for a highly skilled engineer with experience in industrial electrical...
-
Site Reliability Expert
6 days ago
Atlanta, Georgia, United States RIT Solutions, Inc. Full timeAbout UsRIT Solutions, Inc. is a leading provider of innovative technology solutions. We are committed to delivering high-quality products and services that meet the evolving needs of our customers.Job DescriptionWe are seeking a skilled Mid-Level DevOps Engineer to join our team. As a key member of our engineering team, you will be responsible for...
-
Azure Site Reliability Engineer
2 weeks ago
Atlanta, United States Motion Recruitment Partners, LLC Full timeA leading provider in the world of insurance protection is looking to add a Site Reliability Engineer to their team. Integrating physical and digital risk mitigation solutions to reduce fraud in the insurance sector is key. Day to day tasks involve using Azure services, including AKS and Azure DevOps for CI/CD pipelining. Working in the environment on their...