Site Reliability Engineer

3 weeks ago


Atlanta, United States RIT Solutions, Inc. Full time

Site Reliability Engineer
Atlanta, St. Louis, or Denver, onsite 3 days per week, 2 days remote

Looking for a highly motivated Site Reliability Engineer, who is capable of build and run large-scale, massively distributed, fault-tolerant systems. Individual to work with teams across the organization and ensures core services reliability and keep an eye on capacity and performance.
• Responsible for blameless postmortems and proactive identification of potential outages factor into iterative improvement.
• Experience in Designing and Deploying multi-data center Large Scale Web Applications.
• Work closely with dev, and ops teams to build highly available, cost-effective systems.
• Create new tools and scripts designed for auto-remediation of incidents.
• Design/Implementation of Big Data technologies, including Hadoop, MongoDB, Kafka, RabbitMQ, Zookeeper, Spark, ELK, etc.
• Responsible for establishing end-to-end monitoring and alerting on all critical aspects to ensure SLAs and get proactive notifications of possible issues for all systems.
• Design platforms for extremely high uptime metrics.
• Works well independently and requires little or no supervision.
• Work with cloud operations team to resolve trouble tickets, developing and running scripts, and troubleshooting.
• Fully understand the application, microservices interactions.
• Design/Implementation containers/applications in scalable HA/DR multi-tier cloud environments, including new system design, documentation, implementation, and deployment.
• Participate in 24x7 an on-call rotation.

Job Requirements (7+ years of experience in the following areas):
• Experience in providing L4 technical support for production 24x7.
• Strong experience in production support and operations.
• Design/Implementation of network and presentation tier technologies, including F5, Apache, Nginx, etc.
• Experience in Performance Testing/Tuning/Monitoring, maximizing system uptime and availability, ensuring functional and performance SLAs.
• Experience with monitoring Application/Infrastructure Performance, and availability.
• Automation Experience with Build/deployment, Software Configuration/Continuous Integration/Continuous Delivery/Release Engineering related tasks in an JavaEE/C++ Environments.
• Experience in automating manual processes using Python, Ruby, Unix Shell (bash, ksh), perl, Ant, etc.
• Installing, Configuring, Administering, and Tuning of JavaEE Application Servers/Containers like Tomcat, WebSphere, etc.
• Installing/maintaining/Administering software on Unix Linux, Windows servers.
• Experience with Web service technologies, including REST, SOAP, JSON, XML.
• Experience with Cloud Platforms and virtualization Technologies.
• Deploying and automating infrastructure/applications in cloud environment using Chef, RPM, etc.
• Working closely with Development, QA, Product Management, and Production Ops teams to make sure Product Releases on-time with quality.
• Hands on experience Configuring and Administering SCM (GIT, SVN), Build (CMake, Make files, Maven), CI(Jenkins), CD Automation Tools.
• Experience with database (RDBMS, NoSql) technologies is a plus.
• Experience with Performance Testing is a plus.
• Configuring and maintaining SDLC Environments.
• Experience in Agile Methodologies and processes.
• Strong Automation, problem-solving skills, and ability to follow through to completion.
• Demonstrated leadership skills through a variety of activities, including leading or mentoring technical staff.
• Strong verbal/written communication skills.
• Participate in 24x7 an on-call rotation.



  • Atlanta, United States McKesson Full time

    Are you interested in solving operations problems using modern software engineering practices? Do you get excited about running mission critical infrastructure? Do you believe the only way to scale reliably is through automation? The Site Reliability Reliability Engineer, Liability, Reliability, Engineer, Software Engineer, Reliability, Technology,...


  • Atlanta, United States MethodHub Full time

    Job Title: Site Reliablity Engineer (Performance Monitoring)Location: RemoteDuration: Long Term (W2 Only)Client: Direct Job Description:Experience of 6-8 Professional experience as a Site Reliability Engineer (SRE)Software development “hands on” engineer with excellent understanding of SDLC Application delivery.Ability to translate functional and...


  • Atlanta, United States MethodHub Full time

    Job Title: Site Reliablity Engineer (Performance Monitoring) Location: Remote Duration: Long Term (W2 Only) Client: Direct Job Description: Experience of 6-8 Professional experience as a Site Reliability Engineer (SRE) Software development hands on engineer with excellent understanding of SDLC Application delivery. Ability to translate functional and...


  • Atlanta, United States Thoucentric Full time

    Job Description Job Description: We are seeking a skilled and dedicated Site Reliability Engineer (SRE) to join our team. The SRE will be responsible for ensuring the reliability, performance, and scalability of our systems and applications. This role combines software development and systems engineering to build and run large-scale, distributed,...


  • Atlanta, United States Thoucentric Full time

    Job Description Job Description: We are seeking a skilled and dedicated Site Reliability Engineer (SRE) to join our team. The SRE will be responsible for ensuring the reliability, performance, and scalability of our systems and applications. This role combines software development and systems engineering to build and run large-scale, distributed,...


  • Atlanta, United States MethodHub Full time

    Job Title: Site Reliablity Engineer (Performance Monitoring) Location: Remote Duration: Long Term (W2 Only) Client: Direct Job Description: Experience of 6-8 Professional experience as a Site Reliability Engineer (SRE) Software development “hands on” engineer with excellent understanding of SDLC Application delivery. Ability to translate functional and...


  • Atlanta, United States MethodHub Full time

    Job Title: Site Reliablity Engineer (Performance Monitoring)Location: RemoteDuration: Long Term (W2 Only)Client: DirectJob Description:Experience of 6-8 Professional experience as a Site Reliability Engineer (SRE)Software development “hands on” engineer with excellent understanding of SDLC Application delivery.Ability to translate functional and...


  • Atlanta, United States MethodHub Full time

    Job Title: Site Reliablity Engineer (Performance Monitoring) Location: Remote Duration: Long Term (W2 Only) Client: Direct Job Description: Experience of 6-8 Professional experience as a Site Reliability Engineer (SRE) Software development hands on engineer with excellent understanding of SDLC Application delivery. Ability to translate functional and...


  • Atlanta, United States MethodHub Full time

    Job Title: Site Reliablity Engineer (Performance Monitoring)Location: RemoteDuration: Long Term (W2 Only)Client: DirectJob Description:Experience of 6-8 Professional experience as a Site Reliability Engineer (SRE)Software development “hands on” engineer with excellent understanding of SDLC Application delivery.Ability to translate functional and...


  • Atlanta, United States MethodHub Full time

    Job Title: Site Reliablity Engineer (Performance Monitoring)Location: RemoteDuration: Long Term (W2 Only)Client: DirectJob Description:Experience of 6-8 Professional experience as a Site Reliability Engineer (SRE)Software development “hands on” engineer with excellent understanding of SDLC Application delivery.Ability to translate functional and...


  • Atlanta, United States MethodHub Full time

    Job Title: Site Reliablity Engineer (Performance Monitoring)Location: RemoteDuration: Long Term (W2 Only)Client: DirectJob Description:Experience of 6-8 Professional experience as a Site Reliability Engineer (SRE)Software development “hands on” engineer with excellent understanding of SDLC Application delivery.Ability to translate functional and...


  • Atlanta, United States Blackwomenintech Full time

    Join a team recognized for leadership, innovation and diversity As a Site Reliability Engineer here at Honeywell, you will play a critical role in ensuring the reliability, availability, and performance of our systems and applications. You will work closely with cross-functional teams to identify and resolve issues, implement automation solutions, and drive...


  • Atlanta, United States Blackwomenintech Full time

    Join a team recognized for leadership, innovation and diversity As a Site Reliability Engineer here at Honeywell, you will play a critical role in ensuring the reliability, availability, and performance of our systems and applications. You will work closely with cross-functional teams to identify and resolve issues, implement automation solutions, and drive...


  • Atlanta, United States oilandgas.org.uk Full time

    Join a team recognized for leadership, innovation and diversity As a Site Reliability Engineer here at Honeywell, you will play a critical role in ensuring the reliability, availability, and performance of our systems and applications. You will work closely with cross-functional teams to identify and resolve issues, implement automation solutions, and drive...


  • Atlanta, United States Hermeus Full time

    Hermeus is an aerospace and defense technology company founded to radically accelerate air travel by delivering hypersonic aircraft. The company aims to develop hypersonic aircraft quickly and cost-effectively by integrating hardware-rich, iterative development with modern computing and autonomy. This approach has been validated through design, build, and...


  • Atlanta, United States Hermeus Full time

    Hermeus is an aerospace and defense technology company founded to radically accelerate air travel by delivering hypersonic aircraft. The company aims to develop hypersonic aircraft quickly and cost-effectively by integrating hardware-rich, iterative development with modern computing and autonomy. This approach has been validated through design, build, and...


  • Atlanta, United States Hermeus Full time

    Hermeus is an aerospace and defense technology company founded to radically accelerate air travel by delivering hypersonic aircraft. The company aims to develop hypersonic aircraft quickly and cost-effectively by integrating hardware-rich, iterative development with modern computing and autonomy. This approach has been validated through design, build, and...


  • Atlanta, United States Hermeus Full time

    Hermeus is an aerospace and defense technology company founded to radically accelerate air travel by delivering hypersonic aircraft. The company aims to develop hypersonic aircraft quickly and cost-effectively by integrating hardware-rich, iterative development with modern computing and autonomy. This approach has been validated through design, build, and...


  • Atlanta, Georgia, United States Ford Motor Company Full time

    At Ford Motor Company, we believe freedom of movement drives human progress. We also believe in providing you with the freedom to define and realize your dreams. With our incredible plans for the future of mobility, we have a wide variety of opportunities for you to accelerate your career potential as you help us define tomorrow's transportation.As a key...


  • Atlanta, Georgia, United States Ford Motor Company Full time

    At Ford Motor Company, we believe freedom of movement drives human progress. We also believe in providing you with the freedom to define and realize your dreams. With our incredible plans for the future of mobility, we have a wide variety of opportunities for you to accelerate your career potential as you help us define tomorrow's transportation.As a key...