Current jobs related to Site Reliability Engineer - Atlanta - Datum Software


  • Atlanta, Georgia, United States T-Mobile US, Inc. Full time

    About the RoleWe're looking for a talented Site Reliability Engineer to join our team at T-Mobile US, Inc. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our systems and services.Key ResponsibilitiesDesign, implement, and maintain scalable and reliable systems and servicesCollaborate with...


  • atlanta, United States Advansys Full time

    Job Title: Site Reliability Engineer Location: Alpharetta, GA (Locals Candidates only) Duration: Long term We seek a highly skilled Site Reliability Engineer and dynamic – Consultant In this role you will Maintain and improve the reliability, performance, and availability of software systems. Act as a bridge between traditional IT operations and...


  • Atlanta, United States Advansys Full time

    Job Title: Site Reliability Engineer Location: Alpharetta, GA (Locals Candidates only) Duration: Long term We seek a highly skilled Site Reliability Engineer and dynamic – Consultant In this role you will Maintain and improve the reliability, performance, and availability of software systems. Act as a bridge between traditional IT operations and...


  • Atlanta, Georgia, United States Datum Technologies Group Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Datum Technologies Group. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Implement and improve monitoring, alerting,...


  • Atlanta, United States Advansys Full time

    Job Title: Site Reliability Engineer Want to make an application Make sure your CV is up to date, then read the following job specs carefully before applying. Location: Alpharetta, GA (Locals Candidates only) Duration: Long term We seek a highly skilled Site Reliability Engineer and dynamic – Consultant In this role you will Maintain and improve the...


  • Atlanta, United States CV Library Full time

    Title: Site Reliability Engineer Location: Atlanta, GA Duration: 12 months We are seeking a skilled Site Reliability Engineer (SRE) with expertise in AWS cloud infrastructure and robust application monitoring capabilities. As an integral part of our team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based...


  • Atlanta, United States Insight Global Full time

    Must Haves:5+ years of C# .NET Development ExperienceExperience building automated deploymentsIIS application pool experience Plusses:Splunk Scrum Experience Cloud knowledge and experience Day-to-Day Responsibilities:A Fortune 500 client of Insight Global is seeking a Site Reliability Engineer (SRE) to join their team on a hybrid basis. As the sole SRE, you...


  • Atlanta, United States Insight Global Full time

    Must Haves:5+ years of C# .NET Development ExperienceExperience building automated deploymentsIIS application pool experience Plusses:Splunk Scrum Experience Cloud knowledge and experience Day-to-Day Responsibilities:A Fortune 500 client of Insight Global is seeking a Site Reliability Engineer (SRE) to join their team on a hybrid basis. As the sole SRE, you...


  • Atlanta, Georgia, United States Calsoft Labs Inc. Full time

    Job Title: Site Reliability EngineerJob Summary:We are seeking a highly skilled Site Reliability Engineer to join our team at Calsoft Labs Inc. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities:Design and develop scalable and reliable...


  • Atlanta, Georgia, United States Disability Solutions Full time

    Job Title: Sr Engineer, Site ReliabilityAt T-Mobile, we're committed to empowering our employees to drive innovation and excellence. As a Sr Engineer, Site Reliability, you'll play a critical role in ensuring the reliability and scalability of our IT services.Key Responsibilities:Design and implement scalable and reliable software systems, leveraging cloud...


  • Atlanta, United States Tata Consultancy Services Full time

    Job DescriptionAutomating work including infrastructure needs, testing, failover solutions, failure mitigation, and much moreDebugging complex problems across an entire stack and creating solid solutionsDeveloping and building CI/CD processes to improve cadenceUsing Chaos Engineering to test what you build under real-world conditionsTriage product or system...


  • Atlanta, United States Tata Consultancy Services Full time

    Job DescriptionAutomating work including infrastructure needs, testing, failover solutions, failure mitigation, and much moreDebugging complex problems across an entire stack and creating solid solutionsDeveloping and building CI/CD processes to improve cadenceUsing Chaos Engineering to test what you build under real-world conditionsTriage product or system...


  • Atlanta, Georgia, United States Geotab Full time

    About GeotabGeotab is a global leader in IoT and connected transportation, certified as a "Great Place to WorkTM." We're a company of diverse and talented individuals who work together to help businesses grow and succeed, and increase the safety and sustainability of our communities.Job SummaryWe're seeking a Site Reliability Engineer to provide escalated...


  • Atlanta, United States Hermeus Full time

    Hermeus is an aerospace and defense technology company founded to radically accelerate air travel by delivering hypersonic aircraft. The company aims to develop hypersonic aircraft quickly and cost-effectively by integrating hardware-rich, iterative development with modern computing and autonomy. This approach has been validated through design, build, and...


  • Atlanta, United States Tata Consultancy Services Full time

    Job DescriptionJob Type: Fulltime Location: Atlanta GA (Onsite)Experience: 6+years Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much moreDebugging complex problems across an entire stack and creating solid solutionsDeveloping and building CI/CD processes to improve cadenceUsing Chaos Engineering to test...


  • Atlanta, United States Tata Consultancy Services Full time

    Job DescriptionJob Type: Fulltime Location: Atlanta GA (Onsite)Experience: 6+years Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much moreDebugging complex problems across an entire stack and creating solid solutionsDeveloping and building CI/CD processes to improve cadenceUsing Chaos Engineering to test...


  • atlanta, United States Tata Consultancy Services Full time

    Job DescriptionJob Type: Fulltime Location: Atlanta GA (Onsite)Experience: 6+years Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much moreDebugging complex problems across an entire stack and creating solid solutionsDeveloping and building CI/CD processes to improve cadenceUsing Chaos Engineering to test...


  • Atlanta, Georgia, United States Advansys Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Advansys. As a key member of our infrastructure team, you will be responsible for maintaining and improving the reliability, performance, and availability of our software systems.Key Responsibilities:Maintain and improve the reliability, performance, and availability...


  • Atlanta, Georgia, United States Geotab Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Geotab. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based infrastructure. You will work closely with our development team to design, implement, and maintain our cloud infrastructure, ensuring...


  • Atlanta, United States Cox Communications Full time

    This role is for an opening for a Senior Site Reliability Engineer (SRE) on the Manheim Logistics SRE team. The SRE team is tasked with designing and maintaining AWS infrastructure and deployment pipelines for Manheim Logistics 15 development teams. Reliability Engineer, Liability, Reliability, Engineer, Reliability, Monitoring, Technology

Site Reliability Engineer

4 months ago


Atlanta, United States Datum Software Full time

Site Reliability Engineer

Long Term Contract

Atlanta, GA

 

 

 Qualifications:

Manage and optimize data streaming and API components in OpenShift On-premises and AWS.

Proactively review the application's APIs and processes to identify opportunities to optimize the response times for various application components.

Automate various types of testing including data quality checks, automate delivery to production and automate deployment for production.

Develop integrations between the application in on premise and AWS and our third-party tools (ServiceNow, VersionOne, Sumo).

Work with teams to create SLI/SLO's.

Actively monitor and lead troubleshooting of degraded performance and hard to define issues for the platform applications, develop the solution and document artifacts in the back log from root cause analysis.

Evolve the cloud infrastructure ecosystem for our application suite by experimenting with emerging technologies and completing prototypes to understand benefit.

Design and develop CI/CD pipeline to deploy various application artifacts, including APIs and Data Process Jobs.

Analyze, design, and develop the artifacts to configure the monitoring and alerting metrics so the support engineers can proactively and timely validate, troubleshoot, and resolve the issues.

Maintain data integrity and access control by using AWS security tools and services such as HSM, IAM, etc.

Understand and develop tools to monitor AWS billing for the services, generate cost related reports and help develop and implement cost optimization strategies.

Work with enterprise security architects to design and implement data security tools, measures, data encryption, key management; design and develop solutions to address the security vulnerabilities discovered by internal security audit team, as well as by the vendors, security community, etc.; design and develop solutions for support team to regularly scan and review to fix security issues.

Regularly and proactively monitor and analyze the capacity and performance of the platform, work with architecture team to design and implement elastic infrastructure to accommodate the irregular burst of user traffic/requests.

Work with architecture team to develop backup strategy and implement the backup solution for critical data and application components for service restoration and disaster recovery purpose.

Work with architecture, infrastructure, and application teams to provide input on continuous improvement on the design, performance, and security enhancements.

 

Preferred Skills:

 

Deep understanding of the operations of AWS cloud platforms.

Must be well versed in automation, scripting, monitoring, including use of tools from the major cloud platforms, including but not limited to OpenShift Cloud Formation, Terraform, Ansible, Shell, Python.

Preferable for candidates with significant technical knowledge with infrastructure layers, including but not limited to: Linux OS, major virtualization platforms, Traditional and software defined network, Load Balancers, firewall, API tools, element/performance/intelligent monitoring tools, storage, backup strategy, etc.

Significant knowledge and experience in end-to-end operations for enterprise systems and applications, including driving issue resolution for mission critical systems.

Must have experience working to automate, operationalize and improve the Development/QA using CI/CD tools (Gitlab, Github, Jenkins, Maven, Gradle, Nexus).

Working experience with Software Release Management. Desired Qualification.

BS degree in Computer Science or a related technical field or equivalent practical experience. Minimum Experience.

3+ years of related DevOps, SysOps engineering experience with focus on major cloud platforms (AWS preferred).

2+ years of application development experience including data streaming, deploying/monitoring high availability critical application components.

1+ Years in Site Reliability Engineering organization preferred.

Overall, 4-6years of experience.

 

Responsibilities:

 

As an engineer with Retail, Site Reliability Engineering team, you will be at the forefront of Cloud and Big Data technology.

In this role you will establish yourself as a technical leader by exposing yourself to a broad range of industry leading technologies that will help to drive acceleration.

The ideal candidate will have expert design and development capabilities and be positioned to contribute to a growing set of services and features for the ecosystem.

This role will be supporting highly available, business critical applications.

This role will serve as the escalation point for complex and hard to define issues in both on premise and AWS environments.

We are seeking talented engineers, well versed in DevOps technologies, automation, infrastructure orchestration, configuration management, continuous integration, troubleshooting of complex issues, who are not constrained by how "things are usually done”.

 

"All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.”