Current jobs related to CTP Reliability and Monitoring Engineer - Plano - Diverse Lynx


  • Plano, Texas, United States Heidelberg Materials US, Inc. Full time

    About the RoleWe are seeking a highly skilled Reliability Engineer to join our team at Heidelberg Materials US, Inc. in a key role that will drive operational excellence and improve equipment reliability across our cement operations.Key ResponsibilitiesReliability Engineering: Develop and implement reliability engineering strategies and programs to improve...

  • Reliability Engineer

    2 weeks ago


    Plano, Texas, United States Heidelberg Materials US, Inc. Full time

    About the RoleWe are seeking a highly skilled Reliability Engineer to join our team at Heidelberg Materials US, Inc. in a central location. As a key member of our Remote Optimization Center, you will be responsible for providing technical support and expertise in reliability engineering to multiple cement plants within a designated region.Key...


  • Plano, Texas, United States Capgemini Engineering Full time

    Job Title: Site Reliability EngineerJob Overview:Join our dynamic Site Reliability Engineering team, where you will be instrumental in shaping and executing a robust reliability framework for a pioneering organization in the MedTech sector. Collaborate with architecture and engineering teams to deliver software solutions that are not only resilient but also...


  • Plano, Texas, United States Capgemini Engineering Full time

    Job Title: Site Reliability EngineerJob Overview:As a vital member of the Site Reliability Engineering team, you will be instrumental in formulating and executing a reliability framework for a pioneering organization in the MedTech sector. Your role will involve collaborating with and influencing our architecture and engineering teams to deliver robust...


  • Plano, Texas, United States PROCYON TECHNOSTRUCTURE Full time

    Job SummaryAt Procyon Technostructure, we are seeking a highly skilled Senior Network Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining our network infrastructure to ensure high availability and reliability.Key ResponsibilitiesDesign and Implement Network...


  • Plano, Texas, United States Dexian - DISYS Full time

    Senior Site Reliability EngineerDexian - DISYS is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our Incident Management team, you will be responsible for establishing frameworks, best practices, and scope management as we transition Incident Management into a Site Reliability Engineering team.Key...


  • Plano, Texas, United States Toyota Full time

    About the RoleWe are seeking a highly skilled Director of Site Reliability Engineering to lead our new SRE team at Toyota Financial Services. As a key member of our organization, you will be responsible for building and establishing robust processes to ensure the reliability, performance, and scalability of our systems and applications.Key...


  • Plano, United States Forhyre Full time

    Job DescriptionJob DescriptionWe are looking for someone that is generalist at heart, one who is curious, appreciates complexity, knows or wants to learn when to step back and when to dive deep. We call this role a Cloud Service Reliability Engineer. The Cloud Service Reliability Engineer will be responsible for effective design, execution, and maintenance...


  • Plano, United States Yum! Brands Full time

    Contribute to designing and implementing improvements to our build pipelines & our monitoring and support strategy. Your time will be split between several SRE Disciplines including: Resolving infrastructure security vulnerabilities identified in AWS Reliability Engineer, Liability, Reliability, Engineer, Reliability, Monitoring, Restaurant, Technology


  • Plano, United States Amtex Systems Inc. Full time

    Title: Site Reliability Engineer Location: Plano, TX Duration: 6+ months Locals ONLYExperience Level : 10 + years • Should be strong SRE, experience with java, AWS / DevOps / deployment strategy and monitoring tools. Candidates should be with more hands-on experience with Dynatrace / Splunk / CICD / Grafana etc. • Looking for resource with very good...


  • Plano, Texas, United States Dexian Full time

    Job Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Dexian. As a key member of our Incident Management team, you will be responsible for establishing frameworks, best practices, and scope management as we transition Incident Management into a Site Reliability Engineering team.Key...


  • Plano, Texas, United States Capgemini Engineering Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Capgemini Engineering. As a Site Reliability Engineer, you will play a critical role in establishing and implementing a site reliability strategy for our clients in the MedTech industry.Key ResponsibilitiesDevelop and Implement SRE Strategy: Partner with our digital...


  • Plano, United States Capgemini Engineering Full time

    Job Title: Site Reliability EngineerJob Description:As a member of Site Reliability Engineering, you will play a critical role in establishing and implementing a site reliability strategy for an innovative leader in the MedTech industry. You will partner with and influence our architecture and engineering teams in delivering highly resilient software...


  • Plano, Texas, United States Toyota Full time

    About the RoleWe are seeking a highly experienced Director to lead our Site Reliability Engineering (SRE) team at Toyota. As a key member of our organization, you will be responsible for building and managing a high-performing team that ensures the reliability, performance, and scalability of our systems and applications.Key ResponsibilitiesTeam Leadership:...


  • Plano, Texas, United States AT&T Full time

    Job Title: Principal Site Reliability EngineerAT&T is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our Consumer Technology experience team, you will be responsible for delivering innovative and reliable technology solutions to power differentiated, simplified customer experiences.Key...


  • Plano, United States Amtex Systems Inc. Full time

    • Should be strong SRE, experience with java, AWS / DevOps / deployment strategy and monitoring tools. Candidates should be with more hands-on experience with Dynatrace / Splunk / CICD / Grafana etc. • Looking for resource with very good application trouble shooting experience. More on core SRE metrics before going to Prod. uptime vs availability,...


  • Plano, Texas, United States AT&T Full time

    Job Title: Principal Site Reliability EngineerAT&T is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our Consumer Technology experience team, you will be responsible for ensuring the high availability, reliability, and resiliency of our customer and agent-facing experiences and shared omnichannel...


  • Plano, Texas, United States Toyota North America Full time

    About UsToyota is a collaborative and respectful organization where innovation thrives. As a globally recognized brand, we are at the forefront of mobility solutions that enhance lives and exceed expectations. We are committed to nurturing diverse talent and providing opportunities for professional growth.Position OverviewWe are initiating a new Site...


  • Plano, Texas, United States Toyota North America Full time

    About UsToyota is a company built on collaboration and respect, where innovation meets high-quality solutions to enhance lives. We are committed to fostering a diverse workforce that embodies our values of dreaming, doing, and growing together.Position OverviewAs a pivotal member of Toyota Financial Services, you will take the lead in establishing a new Site...


  • Plano, Texas, United States Toyota North America Full time

    About UsToyota is a name synonymous with innovation and quality. Our culture is built on collaboration and respect, fostering an environment where creativity thrives. As a leader in the automotive industry, we are committed to shaping the future of mobility through cutting-edge solutions that enhance lives and provide exceptional experiences for our...

CTP Reliability and Monitoring Engineer

5 months ago


Plano, United States Diverse Lynx Full time

Responsible for ensuring the availability, performance, and reliability of our cloud-based infrastructure and services. The primary focus of this role is designing, implementing, and managing robust monitoring and alerting systems to proactively identify issues and timely incident response. This resource will work closely with the CTP Platform Engineering and Development teams to optimize services and maintain service uptime.

Duties include:
Develop and maintain comprehensive monitoring solutions for cloud-based services and applications.
Configure monitoring tools and systems to collect relevant metrics, logs, and traces.
Create custom monitoring dashboards and reports using DataDog or other tools, to provide real-time insights into system performance and health.
Continuously monitor the cloud infrastructure's performance and capacity, anticipating and addressing potential scalability issues.
Proactively suggest and implement improvements to enhance the system's reliability, resilience, and fault tolerance.
Work on automating tasks to streamline operational processes and reduce manual intervention.
Collaborate with cross-functional teams to investigate and resolve critical incidents, ensuring minimal impact on end-users.
Work with Problem Management team to complete post-mortem analysis of incidents to identify root causes and implement preventive measures.

deal Qualifications:

  • 3+ years' experience working with cloud platforms and services (AWS, Azure, GCP, etc.) in a production environment.
  • DataDog tool usage in few projects including recent work.
  • Solid understanding of monitoring and logging tools, such as Prometheus, Grafana, ELK stack, Splunk, etc.
  • Experience with infrastructure as code (IaC) tools, like Terraform, CloudFormation, or Ansible.
  • Strong scripting and automation skills (e.g., Python, Bash) to facilitate operational tasks.
  • Knowledge of containerization technologies (Docker, Kubernetes) and microservices architecture.
Familiarity with DevOps practices and Agile methodologies.

Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.