Site Reliability Engineer

2 weeks ago


Atlanta, United States ClifyX, INC Full time
Senior Specialist - Software Engineering
Requisition ID : 1289381
Posting Start Date : Apr 24, 2024
Posting End Date : Apr 25, 2024
Recruiter : Ramachandra Reddy

Job Code: 1289381
Job Title: Site Reliability Engineer
Work Location : Location: Atlanta, GA

Job Description:

"Provisioning cloud infrastructure (AWS, GCP) using infrastructure as code (Terraform) :

Creating pipelines and automation to deploy Dockerized applications onto Kubernetes clusters
Build, release and configuration management of production systems
Consulting with engineering teams to help them leverage our platform and tools on which to run their applications
Developing, deploying, and maintaining tools built from the ground up in support of self-service, quality, security, and compliance initiatives.
Collaborating with product, architecture, and engineering groups to build a platform that streamlines application developer productivity and throughput
Participating in metrics gathering, monitoring, and alerting activities, as well as on-call rotations.
Solving new problems with modern technologies.
Automating infrastructure builds/configurations
Build and manage CI/CD pipelines using Jenkins.
Define, Implement and assign ownership for Stability/Reliability(SLIs, SLOs, Error Budgets)
Collaboration with tribes/dev teams on Reliability development (Fixes, Logging, Delivery Metrics)

Key Skillsets:

3+ years of experience developing and/or administering software in public cloud.
Experience in monitoring infrastructure and application uptime and availability to ensure functional and performance objectives.
Experience in languages such as Python, Ruby, Bash, Java, Go, Perl, JavaScript and/or node.js
Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
System administration skills, including automation and orchestration of Linux/Windows using Chef, Puppet, Ansible, Salt Stack and/or containers (Docker, Kubernetes, etc.)
Proficiency with continuous integration and continuous delivery tooling and practices
Experience managing Infrastructure as code via tools such as Terraform or CloudFormation
Experience in setting up and managing/modifying CI/CD pipelines using Jenkins.
Significant experience in configuring industry leading infrastructure/application monitoring tools (Stackdriver, Kibana, Grafana, Datadog, Splunk, Dynatrace, AppDynamics etc) "

  • Atlanta, United States MethodHub Full time

    Job Title: Site Reliablity Engineer (Performance Monitoring)Location: RemoteDuration: Long Term (W2 Only)Client: DirectJob Description:Experience of 6-8 Professional experience as a Site Reliability Engineer (SRE)Software development “hands on” engineer with excellent understanding of SDLC Application delivery.Ability to translate functional and...


  • Atlanta, United States MethodHub Full time

    Job Title: Site Reliablity Engineer (Performance Monitoring)Location: RemoteDuration: Long Term (W2 Only)Client: DirectJob Description:Experience of 6-8 Professional experience as a Site Reliability Engineer (SRE)Software development “hands on” engineer with excellent understanding of SDLC Application delivery.Ability to translate functional and...


  • Atlanta, United States MethodHub Full time

    Job Title: Site Reliablity Engineer (Performance Monitoring) Location: Remote Duration: Long Term (W2 Only) Client: Direct Job Description: Experience of 6-8 Professional experience as a Site Reliability Engineer (SRE) Software development “hands on” engineer with excellent understanding of SDLC Application delivery. Ability to translate functional and...


  • Atlanta, United States Tech Providers Inc. Full time

    Site Reliability EngineerAtlanta GA (Hybrid) 06+ Months Contract to HireSkills: Top 5 Must Haves: Extensive/Strong AWS experience, experience in designing, deploying managing scalable/reliable cloud-based infrastructure; Software Engineering background/experience---Python, JavaScript, Bash, etc.; In-depth knowledge of infrastructure as code (IaC) tools, like...


  • Atlanta, United States Blackwomenintech Full time

    Join a team recognized for leadership, innovation and diversity As a Site Reliability Engineer here at Honeywell, you will play a critical role in ensuring the reliability, availability, and performance of our systems and applications. You will work closely with cross-functional teams to identify and resolve issues, implement automation solutions, and drive...


  • Atlanta, United States Hermeus Full time

    Hermeus is an aerospace and defense technology company founded to radically accelerate air travel by delivering hypersonic aircraft. The company aims to develop hypersonic aircraft quickly and cost-effectively by integrating hardware-rich, iterative development with modern computing and autonomy. This approach has been validated through design, build, and...


  • Atlanta, United States Hermeus Full time

    Hermeus is an aerospace and defense technology company founded to radically accelerate air travel by delivering hypersonic aircraft. The company aims to develop hypersonic aircraft quickly and cost-effectively by integrating hardware-rich, iterative development with modern computing and autonomy. This approach has been validated through design, build, and...


  • Atlanta, Georgia, United States Ford Motor Company Full time

    At Ford Motor Company, we believe freedom of movement drives human progress. We also believe in providing you with the freedom to define and realize your dreams. With our incredible plans for the future of mobility, we have a wide variety of opportunities for you to accelerate your career potential as you help us define tomorrow's transportation.As a key...


  • Atlanta, United States LTIMindtree Full time

    About Us: LTIMindtree is a global technology consulting and digital solutions company that enables enterprises across industries to reimagine business models, accelerate innovation, and maximize growth by harnessing digital technologies. As a digital transformation partner to more than 700+ clients, LTIMindtree brings extensive domain and technology...


  • Atlanta, United States CNA Search Full time

    Functions and Responsibilities Manage production environments by monitoring availability and taking a holistic view of system health Automate reliability, quality, and repeatability of cloud environments Proactively ensure the highest levels of systems and infrastructure availability Responsible for maintaining tools/systems/platforms for cloud service...


  • Atlanta, United States Datum Software Full time

    Site Reliability Engineer Long Term Contract Atlanta, GA      Qualifications: Manage and optimize data streaming and API components in OpenShift On-premises and AWS. Proactively review the application's APIs and processes to identify opportunities to optimize the response times for various application components. Automate various types of testing...


  • Atlanta, Georgia, United States LTIMindtree Full time

    About Us:LTIMindtree is a global technology consulting and digital solutions company that enables enterprises across industries to reimagine business models, accelerate innovation, and maximize growth by harnessing digital technologies. As a digital transformation partner to more than 700+ clients, LTIMindtree brings extensive domain and technology expertise...


  • Atlanta, Georgia, United States Regions Full time

    Thank you for your interest in a career at Regions. At Regions, we believe associates deserve more than just a job. We believe in offering performance-driven individuals a place where they can build a career --- a place to expect more opportunities. If you are focused on results, dedicated to quality, strength and integrity, and possess the drive to succeed,...


  • Atlanta, United States VySystems Full time

    Skills Required:s∙5 or more years of experience as an application developer or SRE. ∙2 or more years of experience with ops automation using a scripting language such as Python or Ansible. ∙Experience with an APM tool such as Dynatrace, New Relic, AppDynamics, or Datadog is preferred. ∙Site Reliability Engineering: Knowledge of the theories and...


  • Atlanta, United States VySystems Full time

    Skills Required:s∙5 or more years of experience as an application developer or SRE. ∙2 or more years of experience with ops automation using a scripting language such as Python or Ansible. ∙Experience with an APM tool such as Dynatrace, New Relic, AppDynamics, or Datadog is preferred. ∙Site Reliability Engineering: Knowledge of the theories and...


  • Atlanta, United States IRIS Consulting Corporation Full time

    As a engineer with Retail, Site Reliability Engineering team, you will be at the forefront of Cloud and Big Data technology. In this role you will establish yourself as a technical leader by exposing yourself to a broad range of industry leading technologies that will help to drive acceleration. The ideal candidate will have expert design and development...


  • Atlanta, United States IRIS Consulting Corporation Full time

    As a engineer with Retail, Site Reliability Engineering team, you will be at the forefront of Cloud and Big Data technology. In this role you will establish yourself as a technical leader by exposing yourself to a broad range of industry leading technologies that will help to drive acceleration. The ideal candidate will have expert design and development...


  • Atlanta, United States Flexton Full time

    Title: Sr. Site Reliability Engineer Location: Atlanta, GA (Hybrid) Duration: 12+ Months Job Description: Extensive/Strong AWS experience---experience in designing, deploying managing scalable/reliable cloud-based infrastructure; Software Engineering background/experience---Python, JavaScript, Bash, etc.; In-depth knowledge of infrastructure as code (IaC)...


  • Atlanta, United States At Datum Tech Group Full time

    Long Term Contract Atlanta, GA Qualifications Manage and optimize data streaming and API components in OpenShift On-premises and AWS. Proactively review the application's APIs and processes to identify opportunities to optimize the response times for various application components. Automate various types of testing including data quality checks, automate...


  • Atlanta, United States Motion Recruitment Full time

    A proven performer in HealthTech and Digital Recording is looking to continue to expand their DevOps team with an ambitious and skilled Site Reliability Engineer.  The ideal candidate will, be working on large-scale projects, primarily involving automation expansion and cloud migration. You will be developing in an environment that is primarily AWS, and...