Current jobs related to Site Reliability Engineering - Atlanta - Cloud Hybrid Technologies, LLC

Site Reliability Engineer

7 days ago

Atlanta, United States Origami Risk Full time

Join to apply for the Site Reliability Engineer role at Origami Risk1 day ago Be among the first 25 applicantsJoin to apply for the Site Reliability Engineer role at Origami RiskThe Site Reliability Engineer is a key force behind improving Origami’s time to resolution and advancing overall site reliability and scalability. This person participates in...
Site Reliability Engineer

3 weeks ago

Atlanta, United States Tata Consultancy Services Full time

Site Reliability Engineer (SRE) - Full Time Location: Atlanta Metropolitan Area Salary Range: $100,000 - $125,000 per year Job Description We are seeking an experienced Site Reliability Engineer to build and support a reliable application suite, implement Service‑Reliability Engineering practices, and ensure the availability, reliability, and performance...
Site Reliability Engineer

3 weeks ago

Atlanta, United States McKesson’s Corporate Full time

Site Reliability Engineer page is loaded## Site Reliability Engineerremote type: Hybridtime type: Full timeposted on: Posted 4 Days Agojob requisition id: JR0140698McKesson is an impact-driven, Fortune 10 company that touches virtually every aspect of healthcare. We are known for delivering insights, products, and services that make quality care more...
Site Reliability Engineer

3 weeks ago

Atlanta, United States Rx Savings Solutions Full time

Site Reliability Engineer McKesson is an impact‑driven, Fortune 10 company that touches virtually every aspect of healthcare. We are known for delivering insights, products, and services that make quality care more accessible and affordable. Here, we focus on the health, happiness, and well‑being of you and those we serve – we care. Rx Savings...
Site Reliability Engineer

2 weeks ago

Atlanta, United States AutoRABIT Holding Inc. Full time

About the role:AutoRABIT is looking for a Site Reliability/DevSecOps Engineer to help develop, scale and operate our cloud servicesIn this role you will be an experienced business professional able to implement and execute best practice operations and improvements across teams by providing visibility and recommendations for improved reliability and...
Site Reliability Engineer

6 days ago

Atlanta, United States Origami Risk LLC Full time

OverviewThe Site Reliability Engineer is a key force behind improving Origami’s time to resolution and advancing overall site reliability and scalability. This person participates in efforts to identify root causes during post-incident investigations, while also identifying preventative measures to minimize future disruptions. They also assist with...
Lead Site Reliability Engineer

8 hours ago

Atlanta, Georgia, United States Cox Automotive Inc. Full time

The Lead Site Reliability Engineer will be part of the Site Reliability Engineering (SRE) team. The SRE team drives reliability, observability, and engineering practice maturity across over 150 teams made up of over a thousand engineers in our part of Cox Automotive. We build processes, documentation, and tools that scale: deep observability to detect and...
Site Reliability Engineer

2 weeks ago

Atlanta, GA, United States Tier4 Group Full time

Job DescriptionJob Description Position: Site Reliability Engineer (SRE) - Infrastructure Employment Type: Full-Time Onsite Hybrid Overview The Site Reliability Engineer (SRE) will ensure the reliability, scalability, and performance of enterprise applications and services across cloud and on-premises environments. This role focuses on automation,...
Site Reliability Engineer — Build Reliable Cloud Apps

3 weeks ago

Atlanta, United States Tata Consultancy Services Full time

A leading IT consulting firm is looking for a Site Reliability Engineer to enhance and support their application suite. The ideal candidate will oversee the implementation of Service Reliability Engineering practices and manage application performance. The role entails using tools like CloudWatch and Dynatrace, alongside programming in Node.js and...
Site Reliability Engineer Architect

2 weeks ago

Atlanta, United States Compunnel Inc. Full time

Site Reliability Engineer Architect -- GOEDC As a Site Reliability Engineer Architect at Compunnel Inc., you will design and implement highly resilient, fault‑tolerant architectures leveraging AWS services (EC2, Lambda, RDS, DynamoDB, ECS/EKS, etc.). Responsibilities & Qualifications Design and implement highly resilient, fault‑tolerant architectures...

Site Reliability Engineering

3 days ago

Atlanta, United States Cloud Hybrid Technologies, LLC Full time

Overview Site Reliability Engineering (SRE) Architect Location: Atlanta, GA Duration: 12 Months + Extension Hourly Rate: DOE Work Authorization: As an SRE Architect, you will be a pivotal technical leader responsible for designing, building, and evolving the foundational systems and practices that ensure the reliability, scalability, performance, and efficiency of our critical services. Moving beyond day-to-day operations, you will focus on the strategic architectural direction of SRE function, defining standards, blueprints, and frameworks that enable development teams and fellow SRE operations team to build and operate highly resilient systems. Leverage deep expertise in software engineering, distributed systems, cloud infrastructure, and SRE principles to influence technology choices, establish best practices, and foster a proactive culture of reliability across the organization and much beyond observability pillar. Key Responsibilities Reliability Strategy & Design: Architect and design highly available, scalable, secure, and cost-effective infrastructure and application patterns on AWS Define and evangelize SRE best practices, standards, and blueprints for service design, deployment, monitoring, and operational readiness across the engineering organization Review current observability implementation to identify gaps and define steps to reach next level maturity of observability setup to provide deep insights into system health and behaviour With overall maturity lead the definition and implementation strategy for Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets for critical services Design solutions to systematically reduce operational toil through automation and improved system design Evaluate current SRE tools and automation frameworks (e.g., CI/CD pipelines, Infrastructure as Code modules, automated incident remediation, chaos engineering platforms) and suggest enhancement that will help overall enhancement of capability Evaluate, prototype, and recommend new technologies, tools, and methodologies to enhance system reliability, developer productivity, and operational efficiency Technical Leadership & Consultation: Act as a senior technical advisor and subject matter expert on reliability, scalability, and performance for development and platform teams Provide architectural guidance during the design phase of new services and features to ensure reliability principles are embedded early (shift-left) Mentor and coach other SREs and engineers, fostering technical excellence and adherence to SRE principles Lead architectural reviews and production readiness assessments for critical systems Resilience: Lead blameless postmortems for significant incidents, ensuring root causes are identified and systemic architectural improvements are prioritized and implemented Architect and advocate for resilience patterns (e.g., circuit breaking, rate limiting, graceful degradation, chaos engineering) within applications and infrastructure Required Qualifications Proven experience in an architectural role, designing solutions for reliability, scalability, and performance Deep understanding and practical application of SRE principles (SLIs/SLOs, error budgets, toil reduction, automation, incident management, postmortems) Expertise in cloud computing platforms (e.g., AWS) including infrastructure, networking, and security services Strong experience with containerization and orchestration technologies (Kubernetes, Docker, serverless computing) Solid experience designing and implementing observability solutions (e.g., Dynatrace, Prometheus, Grafana, ELK/EFK Stack, Jaeger, OpenTelemetry) Strong programming/scripting skills (e.g., Python, Go, Bash) for automation and tool development Excellent analytical, problem-solving, and strategic thinking skills. Strong communication, collaboration, and leadership skills with the ability to influence technical direction across teams Preferred Qualifications Experience designing and implementing chaos engineering practices and platforms Cloud Hybrid is an equal opportunity employer inclusive of female, minority, disability and veterans, (M/F/D/V). Hiring, promotion, transfer, compensation, benefits, discipline, termination and all other employment decisions are made without regard to race, color, religion, sex, sexual orientation, gender identity, age, disability, national origin, citizenship/immigration status, veteran status or any other protected status. Cloud Hybrid will not make any posting or employment decision that does not comply with applicable laws relating to labor and employment, equal opportunity, employment eligibility requirements or related matters. Nor will Cloud Hybrid require in a posting or otherwise U.S. citizenship or lawful permanent residency in the U.S. as a condition of employment except as necessary to comply with law, regulation, executive order, or federal, state, or local government contract #J-18808-Ljbffr

Americas

Europe

Asia / Oceania

Africa

Current jobs related to Site Reliability Engineering - Atlanta - Cloud Hybrid Technologies, LLC

Site Reliability Engineering