Site Reliability Engineer
7 hours ago
Job Description
Job Type: Fulltime
Location: Atlanta GA (Onsite)
Experience: 6+years
- Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
- Debugging complex problems across an entire stack and creating solid solutions
- Developing and building CI/CD processes to improve cadence
- Using Chaos Engineering to test what you build under real-world conditions
- Triage product or system issues and debug/track/resolve by analyzing the sources of issues and the impact on hardware, network, or service operations and quality.
- Participate in, or lead design reviews with peers and stakeholders to decide amongst available technologies.
- Experience with an APM tool such as Dynatrace, New Relic, AppDynamics, or Datadog.
- Performance Measurement and Tuning: Knowledge of system performance, testing and programming; ability to monitor, measure, and optimize system performance and network communication.
- Site Reliability Engineering: Knowledge of the theories and methodologies of reliability engineering; ability to design, develop and support various tools, services and applications to maintain a reliable site environment.
- Support capacity planning, availability, scalability, security and latency considerations for new infrastructure and service provisioning as appropriate
- Responsible for improvements to end-to-end availability and performance of mission critical services and build automation to prevent problem recurrence.
- Strong experience setting SLOs / SLIs / error budgets and managing of reliability for infrastructure and applications
- Partner with other SREs to bring best practices or learnings from across the organization to them
- Scale and optimize existing infrastructure and services sustainably through mechanisms, including automation, and evolve them by improving reliability and efficiency
- Manage end-to-end availability and performance of mission-critical services and build automation to prevent problem recurrence
- Maintain infrastructure and services by measuring, and monitoring system metrics to proactively identify operational efficiencies, potential outages and security threats in Development, UAT, Staging and Production environments
- Practice sustainable incident response and blameless postmortems
- Develop and maintain solution and operational documentation and designs for all infrastructure and services within the scope of SRE
Other Skills
- AWS SysOps Administrator OR AWS DevOps Engineer certification
- Experience with Akamai or related WAF application preferred.
- Experience with OpenShift, Kubernetes.
- Experience with setting up synthetic monitors and tracking SLAs.
- Experience with airline applications and infrastructure technology is a plus.
- Experience developing applications and/or automation runn ing in Red Hat OpenShift is a plus.
-
Site Reliability Engineer
1 month ago
atlanta, United States Advansys Full timeJob Title: Site Reliability Engineer Location: Alpharetta, GA (Locals Candidates only) Duration: Long term We seek a highly skilled Site Reliability Engineer and dynamic – Consultant In this role you will Maintain and improve the reliability, performance, and availability of software systems. Act as a bridge between traditional IT operations and...
-
Site Reliability Engineer
3 weeks ago
Atlanta, United States Advansys Full timeJob Title: Site Reliability Engineer Location: Alpharetta, GA (Locals Candidates only) Duration: Long term We seek a highly skilled Site Reliability Engineer and dynamic – Consultant In this role you will Maintain and improve the reliability, performance, and availability of software systems. Act as a bridge between traditional IT operations and...
-
Site Reliability Engineer
2 weeks ago
Atlanta, United States Advansys Full timeJob Title: Site Reliability Engineer Want to make an application Make sure your CV is up to date, then read the following job specs carefully before applying. Location: Alpharetta, GA (Locals Candidates only) Duration: Long term We seek a highly skilled Site Reliability Engineer and dynamic – Consultant In this role you will Maintain and improve the...
-
Site Reliability Engineer
20 hours ago
Atlanta, United States CV Library Full timeTitle: Site Reliability Engineer Location: Atlanta, GA Duration: 12 months We are seeking a skilled Site Reliability Engineer (SRE) with expertise in AWS cloud infrastructure and robust application monitoring capabilities. As an integral part of our team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based...
-
Site Reliability Engineer
2 months ago
Atlanta, United States ACL Digital Full timeTitle:: Site Reliability EngineerLocation:: Atlanta, GA (Hybrid role, 3x days onsite/week)Type of Hire:: Contract (c2c/w2)Duration:: 12 months with possible extension Site Reliability Engineer (SRE) with AWS Cloud and Application Monitoring Experience** We are seeking a skilled Site Reliability Engineer (SRE) with expertise in AWS cloud infrastructure and...
-
Site Reliability Engineer
2 months ago
Atlanta, United States ACL Digital Full timeTitle:: Site Reliability EngineerLocation:: Atlanta, GA (Hybrid role, 3x days onsite/week)Type of Hire:: Contract (c2c/w2)Duration:: 12 months with possible extension Site Reliability Engineer (SRE) with AWS Cloud and Application Monitoring Experience** We are seeking a skilled Site Reliability Engineer (SRE) with expertise in AWS cloud infrastructure and...
-
Site Reliability Engineer
2 weeks ago
Atlanta, United States Insight Global Full timeMust Haves:5+ years of C# .NET Development ExperienceExperience building automated deploymentsIIS application pool experience Plusses:Splunk Scrum Experience Cloud knowledge and experience Day-to-Day Responsibilities:A Fortune 500 client of Insight Global is seeking a Site Reliability Engineer (SRE) to join their team on a hybrid basis. As the sole SRE, you...
-
Site Reliability Engineer
3 weeks ago
Atlanta, United States Insight Global Full timeMust Haves:5+ years of C# .NET Development ExperienceExperience building automated deploymentsIIS application pool experience Plusses:Splunk Scrum Experience Cloud knowledge and experience Day-to-Day Responsibilities:A Fortune 500 client of Insight Global is seeking a Site Reliability Engineer (SRE) to join their team on a hybrid basis. As the sole SRE, you...
-
Site Reliability Engineer
3 weeks ago
Atlanta, United States Insight Global Full timePosition Title: Site Reliability EngineerLocation: Atlanta, GA; Portland, ME; or Chattanooga, TN (3 days/week onsite)Compensation: $130-150k Duration: Full-Time, Direct Hire Job Overview:A Fortune 500 client of Insight Global is seeking a dedicated Site Reliability Engineer (SRE) to join their team. As the sole SRE, you will play a crucial role in...
-
Site Reliability Engineer
4 weeks ago
Atlanta, United States Tata Consultancy Services Full timeJob DescriptionAutomating work including infrastructure needs, testing, failover solutions, failure mitigation, and much moreDebugging complex problems across an entire stack and creating solid solutionsDeveloping and building CI/CD processes to improve cadenceUsing Chaos Engineering to test what you build under real-world conditionsTriage product or system...
-
Site Reliability Engineer
4 weeks ago
Atlanta, United States Tata Consultancy Services Full timeJob DescriptionAutomating work including infrastructure needs, testing, failover solutions, failure mitigation, and much moreDebugging complex problems across an entire stack and creating solid solutionsDeveloping and building CI/CD processes to improve cadenceUsing Chaos Engineering to test what you build under real-world conditionsTriage product or system...
-
Site Reliability Engineer
3 months ago
Atlanta, United States Hermeus Full timeHermeus is an aerospace and defense technology company founded to radically accelerate air travel by delivering hypersonic aircraft. The company aims to develop hypersonic aircraft quickly and cost-effectively by integrating hardware-rich, iterative development with modern computing and autonomy. This approach has been validated through design, build, and...
-
Site Reliability Engineer
1 month ago
Atlanta, United States Datum Technologies Group Full timeJob Details:Site Reliability EngineerLong term contractAtlanta, GAQualifications:Must have Skills:Deep understanding of AWS services (Lambda, S3, SQS, IAM, Route 53 etc.) and proficiency in infrastructure as code (e.g., Terraform, CloudFormation).Hands-on experience with monitoring tools such as CloudWatch, Sumo Logic, Dynatrace, Grafana, or similar for...
-
Site Reliability Engineer
7 minutes ago
Atlanta, United States Tata Consultancy Services Full timeJob DescriptionJob Type: Fulltime Location: Atlanta GA (Onsite)Experience: 6+years Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much moreDebugging complex problems across an entire stack and creating solid solutionsDeveloping and building CI/CD processes to improve cadenceUsing Chaos Engineering to test...
-
Site Reliability Engineer
2 months ago
Atlanta, United States Hermeus Full timeHermeus is an aerospace and defense technology company founded to radically accelerate air travel by delivering hypersonic aircraft. The company aims to develop hypersonic aircraft quickly and cost-effectively by integrating hardware-rich, iterative development with modern computing and autonomy. This approach has been validated through design, build, and...
-
Site Reliability Engineer
1 week ago
Atlanta, Georgia, United States Advansys Full timeAbout the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Advansys. As a key member of our infrastructure team, you will be responsible for maintaining and improving the reliability, performance, and availability of our software systems.Key Responsibilities:Maintain and improve the reliability, performance, and availability...
-
Senior Site Reliability Engineer
1 week ago
Atlanta, United States Cox Communications Full timeThis role is for an opening for a Senior Site Reliability Engineer (SRE) on the Manheim Logistics SRE team. The SRE team is tasked with designing and maintaining AWS infrastructure and deployment pipelines for Manheim Logistics 15 development teams. Reliability Engineer, Liability, Reliability, Engineer, Reliability, Monitoring, Technology
-
DevOps - Site Reliability Engineer
1 month ago
Atlanta, United States Motion Recruitment Full timeA prominent insurance firm located in Atlanta is seeking skilled professionals to join their engineering team. They are currently in search of a DevOps/Senior Site Reliability Engineer for a full-time position, offering a hybrid work model at their Atlanta office. This company is at the cutting edge of innovation in content and presentation software designed...
-
Site Reliability Engineer
3 weeks ago
Atlanta, United States Elite Mente llc. Full timeRole: Site Reliability Engineer (SRE) Location: Atlanta, GA Key Responsibilities Design, implement, and maintain scalable and reliable cloud infrastructure on AWS. Monitor system performance and troubleshoot issues using AWS CloudWatch and other monitoring tools. Implement and maintain logging solutions, with a preference for Sumologic...
-
Lead Site Reliability Engineer
2 weeks ago
Atlanta, United States Bose Full timeLead Site Reliability Engineer Locations: US, GA - Atlanta Time Type: Full time Posted on: Posted 11 Days Ago Job Requisition ID: R27433 You know the moment. It’s the first notes of that song you love, the intro to your favorite movie, or simply the sound of someone you love saying “hello.” It’s in these moments that sound matters most. At Bose, we...