Site Reliability Engineer
1 week ago
Optomi, in partnership with a leading technology operations center, is looking for an SRE - Cloud Platform to join their team in Plano, TX.
6 month contract to hire
Onsite in Plano, TX 4x/week
Position Summary: The SRE - Cloud Platform will be focused on operating and automating scalable, resilient AWS infrastructure. Working with core AWS services such as EKS, Lambda, CloudWAN, ECR, and Systems Manager, this role will drive self-healing automation, observability, and CI/CD pipeline integration. The role embodies SRE best practices to ensure reliability, performance, and operational excellence of cloud-native platforms supporting business-critical applications. This position will collaborate closely with Cloud Platform Development Teams, Production Engineering, and Major Incident Management teams to resolve production issues and improve infrastructure.
What the right candidate will enjoy:
- Opportunity to work with cutting-edge AWS technologies.
- Collaborative and cross-functional team environment.
- Focus on automation, scalability, and operational excellence.
What type of experience does the right candidate have:
- Solid understanding of SRE concepts: SLIs, SLOs, error budgets, incident response.
- Strong hands-on experience with AWS services such as EKS, Lambda, CloudWAN, and Systems Manager.
- Experience with infrastructure-as-code tools like Terraform and CloudFormation.
- Proficiency in scripting languages such as Python, Bash, or PowerShell.
- Familiarity with DevOps tools like GitHub, Harness, and Dynatrace.
What the responsibilities are of the right candidate:
- Build and maintain components required to automate and self-heal AWS infrastructure.
- Develop and maintain infrastructure as code (IaC) using Terraform for scalable and repeatable deployments.
- Manage container orchestration platforms and related cloud-native services.
- Define and measure SLIs/SLOs, error budgets, and drive reliability improvements.
- Implement monitoring and observability using Dynatrace and AWS native services like CloudWatch.
- Participate in incident management, on-call rotations, and lead blameless postmortems.
- Collaborate cross-functionally to embed SRE principles into cloud platform design and operation.
- Troubleshoot network issues and manage cloud routing.
Added bonus if you have:
- Certifications like AWS Certified DevOps Engineer or AWS Certified Solutions Architect.
- Knowledge of integration tools and technologies like MuleSoft, Camel, and message streaming services.
-
Site Reliability Engineer
2 weeks ago
Plano, TX, United States E-Solutions Full timeRole: Site Reliability Engineer Location: Plano, TX (Onsite) Job Description: Mandatory Skills : Integration Services SRE (skills - MuleSoft, Middleware, Camel, Tibco): Required experienced Integration Services SRE to ensure the reliability, scalability, and performance of enterprise integration platforms. The role involves managing and optimizing middleware...
-
Senior Site Reliability Engineer
2 weeks ago
Plano, TX, United States Toyota Motor Sales, U.S.A., Inc. Full timeOverview Who we are Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world's most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We're looking for...
-
Senior Site Reliability Engineer
1 week ago
Plano, TX, United States Toyota Full timeOverview Who we are Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world’s most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We’re looking for...
-
Lead Site Reliability Engineer
1 week ago
Plano, TX, United States Toyota Full timeOverview Who we are Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world's most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We're looking for...
-
Lead Site Reliability Engineer
1 week ago
Plano, TX, United States Toyota Full timeOverview Who we are Collaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world's most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We're looking for...
-
Senior Network Reliability Engineer
6 days ago
Plano, TX, United States Procyon TS Full timeIn this role, you will: Drive solid system architecture and guide and mentor well-disciplined code development practices (i.e. Repository procedures for proper code check-out/in); Manage Safe feature branching strategies and versioning control; Develop proper work-flow for team code review and deliver well vetted and tested products. Will oversee/author...
-
Senior Network Reliability Engineer
2 weeks ago
Plano, TX, United States Procyon TS Full timeIn this role, you will: Drive solid system architecture and guide and mentor well-disciplined code development practices (i.e. Repository procedures for proper code check-out/in); Manage Safe feature branching strategies and versioning control; Develop proper work-flow for team code review and deliver well vetted and tested products. Will oversee/author...
-
Senior Network Reliability Engineer
1 week ago
Plano, TX, United States Procyon TS Full timeIn this role, you will: Drive solid system architecture and guide and mentor well-disciplined code development practices (i.e. Repository procedures for proper code check-out/in); Manage Safe feature branching strategies and versioning control; Develop proper work-flow for team code review and deliver well vetted and tested products. Will oversee/author...
-
Site reliability Engineer
1 week ago
Plano, TX, United States Apex Informatics Full timeLocation: Plano, TX (Hybrid)3 days onsite 2 days remotelook for nearby Candidates Must have Skills:Need SRE mindset Preferred coming from development background AWS Splunk App Dynamics (good Monitoring background ) Job responsibilities • Troubleshoot technical issues (Java/J2EE, .Net, Cloud etc) or escalate and work with appropriate technology teams to...
-
Plano, TX, United States Diverse Lynx Full timeSRE (Safety & Reliability Engineering) Consultant Role. Location Remote anywhere in the US: JD: On-prem infrastructure management Manage client's on-prem infrastructure. Maintain uptime, reliability, and readiness of on-prem engineering cloud spread across multiple data centers. Guard SLAs Guard service level agreements (SLAs) for critical...