Current jobs related to CTP Reliability and Monitoring Engineer - Plano - Diverse Lynx
-
Reliability Engineer
1 week ago
Plano, Texas, United States Heidelberg Materials US, Inc. Full timeAbout the RoleWe are seeking a highly skilled Reliability Engineer to join our team at Heidelberg Materials US, Inc. in a key role that will drive operational excellence and improve equipment reliability across our cement operations.Key ResponsibilitiesReliability Engineering: Develop and implement reliability engineering strategies and programs to improve...
-
Reliability Engineer
2 weeks ago
Plano, Texas, United States Heidelberg Materials US, Inc. Full timeAbout the RoleWe are seeking a highly skilled Reliability Engineer to join our team at Heidelberg Materials US, Inc. in a central location. As a key member of our Remote Optimization Center, you will be responsible for providing technical support and expertise in reliability engineering to multiple cement plants within a designated region.Key...
-
Infrastructure Reliability Engineer
3 weeks ago
Plano, Texas, United States Capgemini Engineering Full timeJob Title: Site Reliability EngineerJob Overview:Join our dynamic Site Reliability Engineering team, where you will be instrumental in shaping and executing a robust reliability framework for a pioneering organization in the MedTech sector. Collaborate with architecture and engineering teams to deliver software solutions that are not only resilient but also...
-
Reliability Engineering Specialist
4 weeks ago
Plano, Texas, United States Capgemini Engineering Full timeJob Title: Site Reliability EngineerJob Overview:As a vital member of the Site Reliability Engineering team, you will be instrumental in formulating and executing a reliability framework for a pioneering organization in the MedTech sector. Your role will involve collaborating with and influencing our architecture and engineering teams to deliver robust...
-
Senior Network Reliability Engineer
2 weeks ago
Plano, Texas, United States PROCYON TECHNOSTRUCTURE Full timeJob SummaryAt Procyon Technostructure, we are seeking a highly skilled Senior Network Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining our network infrastructure to ensure high availability and reliability.Key ResponsibilitiesDesign and Implement Network...
-
Senior Cloud Reliability Engineer
2 days ago
Plano, Texas, United States Dexian - DISYS Full timeSenior Site Reliability EngineerDexian - DISYS is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our Incident Management team, you will be responsible for establishing frameworks, best practices, and scope management as we transition Incident Management into a Site Reliability Engineering team.Key...
-
Site Reliability Engineering Director
4 days ago
Plano, Texas, United States Toyota Full timeAbout the RoleWe are seeking a highly skilled Director of Site Reliability Engineering to lead our new SRE team at Toyota Financial Services. As a key member of our organization, you will be responsible for building and establishing robust processes to ensure the reliability, performance, and scalability of our systems and applications.Key...
-
Cloud Service Reliability Engineer
4 months ago
Plano, United States Forhyre Full timeJob DescriptionJob DescriptionWe are looking for someone that is generalist at heart, one who is curious, appreciates complexity, knows or wants to learn when to step back and when to dive deep. We call this role a Cloud Service Reliability Engineer. The Cloud Service Reliability Engineer will be responsible for effective design, execution, and maintenance...
-
Sr. Site Reliability Engineer
4 weeks ago
Plano, United States Yum! Brands Full timeContribute to designing and implementing improvements to our build pipelines & our monitoring and support strategy. Your time will be split between several SRE Disciplines including: Resolving infrastructure security vulnerabilities identified in AWS Reliability Engineer, Liability, Reliability, Engineer, Reliability, Monitoring, Restaurant, Technology
-
Site Reliability Engineer
2 weeks ago
Plano, United States Amtex Systems Inc. Full timeTitle: Site Reliability Engineer Location: Plano, TX Duration: 6+ months Locals ONLYExperience Level : 10 + years • Should be strong SRE, experience with java, AWS / DevOps / deployment strategy and monitoring tools. Candidates should be with more hands-on experience with Dynatrace / Splunk / CICD / Grafana etc. • Looking for resource with very good...
-
Senior Site Reliability Engineer
20 hours ago
Plano, Texas, United States Dexian Full timeJob Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Dexian. As a key member of our Incident Management team, you will be responsible for establishing frameworks, best practices, and scope management as we transition Incident Management into a Site Reliability Engineering team.Key...
-
Site Reliability Engineer
2 weeks ago
Plano, Texas, United States Capgemini Engineering Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Capgemini Engineering. As a Site Reliability Engineer, you will play a critical role in establishing and implementing a site reliability strategy for our clients in the MedTech industry.Key ResponsibilitiesDevelop and Implement SRE Strategy: Partner with our digital...
-
Site Reliability Engineer
2 months ago
Plano, United States Capgemini Engineering Full timeJob Title: Site Reliability EngineerJob Description:As a member of Site Reliability Engineering, you will play a critical role in establishing and implementing a site reliability strategy for an innovative leader in the MedTech industry. You will partner with and influence our architecture and engineering teams in delivering highly resilient software...
-
Plano, Texas, United States Toyota Full timeAbout the RoleWe are seeking a highly experienced Director to lead our Site Reliability Engineering (SRE) team at Toyota. As a key member of our organization, you will be responsible for building and managing a high-performing team that ensures the reliability, performance, and scalability of our systems and applications.Key ResponsibilitiesTeam Leadership:...
-
Principal Site Reliability Engineer
7 days ago
Plano, Texas, United States AT&T Full timeJob Title: Principal Site Reliability EngineerAT&T is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our Consumer Technology experience team, you will be responsible for delivering innovative and reliable technology solutions to power differentiated, simplified customer experiences.Key...
-
Site Reliability Engineer
3 weeks ago
Plano, United States Amtex Systems Inc. Full time• Should be strong SRE, experience with java, AWS / DevOps / deployment strategy and monitoring tools. Candidates should be with more hands-on experience with Dynatrace / Splunk / CICD / Grafana etc. • Looking for resource with very good application trouble shooting experience. More on core SRE metrics before going to Prod. uptime vs availability,...
-
Principal Site Reliability Engineer
1 day ago
Plano, Texas, United States AT&T Full timeJob Title: Principal Site Reliability EngineerAT&T is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our Consumer Technology experience team, you will be responsible for ensuring the high availability, reliability, and resiliency of our customer and agent-facing experiences and shared omnichannel...
-
Senior Manager, Site Reliability Engineering
3 weeks ago
Plano, Texas, United States Toyota North America Full timeAbout UsToyota is a collaborative and respectful organization where innovation thrives. As a globally recognized brand, we are at the forefront of mobility solutions that enhance lives and exceed expectations. We are committed to nurturing diverse talent and providing opportunities for professional growth.Position OverviewWe are initiating a new Site...
-
Senior Manager, Site Reliability Engineering
4 weeks ago
Plano, Texas, United States Toyota North America Full timeAbout UsToyota is a company built on collaboration and respect, where innovation meets high-quality solutions to enhance lives. We are committed to fostering a diverse workforce that embodies our values of dreaming, doing, and growing together.Position OverviewAs a pivotal member of Toyota Financial Services, you will take the lead in establishing a new Site...
-
Senior Manager, Site Reliability Engineering
3 weeks ago
Plano, Texas, United States Toyota North America Full timeAbout UsToyota is a name synonymous with innovation and quality. Our culture is built on collaboration and respect, fostering an environment where creativity thrives. As a leader in the automotive industry, we are committed to shaping the future of mobility through cutting-edge solutions that enhance lives and provide exceptional experiences for our...
CTP Reliability and Monitoring Engineer
5 months ago
Responsible for ensuring the availability, performance, and reliability of our cloud-based infrastructure and services. The primary focus of this role is designing, implementing, and managing robust monitoring and alerting systems to proactively identify issues and timely incident response. This resource will work closely with the CTP Platform Engineering and Development teams to optimize services and maintain service uptime.
Duties include:
Develop and maintain comprehensive monitoring solutions for cloud-based services and applications.
Configure monitoring tools and systems to collect relevant metrics, logs, and traces.
Create custom monitoring dashboards and reports using DataDog or other tools, to provide real-time insights into system performance and health.
Continuously monitor the cloud infrastructure's performance and capacity, anticipating and addressing potential scalability issues.
Proactively suggest and implement improvements to enhance the system's reliability, resilience, and fault tolerance.
Work on automating tasks to streamline operational processes and reduce manual intervention.
Collaborate with cross-functional teams to investigate and resolve critical incidents, ensuring minimal impact on end-users.
Work with Problem Management team to complete post-mortem analysis of incidents to identify root causes and implement preventive measures.
deal Qualifications:
- 3+ years' experience working with cloud platforms and services (AWS, Azure, GCP, etc.) in a production environment.
- DataDog tool usage in few projects including recent work.
- Solid understanding of monitoring and logging tools, such as Prometheus, Grafana, ELK stack, Splunk, etc.
- Experience with infrastructure as code (IaC) tools, like Terraform, CloudFormation, or Ansible.
- Strong scripting and automation skills (e.g., Python, Bash) to facilitate operational tasks.
- Knowledge of containerization technologies (Docker, Kubernetes) and microservices architecture.
Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.