Lead Site Reliability Engineer

3 days ago


Plano, United States Cognizant Full time

About Cognizant’s Digital Engineering Practice:

At Cognizant Digital Engineering, a small cross functional team comprised of a Product Manager, an Architect, Full-Stack Developers, UI/UX designers and Big Data analysts builds higher quality software faster siloed individuals working independently. Small, nimble engineering teams generate collective empathy and comradery, thus increasing their ability to anticipate unforeseen development scope changes and maintain high quality deliverables. Across our US Studio system or within client development sites, our Digital Engineering teams ideate and develop innovative cloud-based solutions following a Lean-Agile process with DevOps culture. Working in Cognizant Digital Engineering provides DevOps engineers consistent opportunities to push digital boundaries while growing their exposure to transformational technologies.

The Role:

Cognizant is looking for an experienced and innovative Lead SRE Engineer experience to serve our diverse base of global clients. As a member of our team, you will build cutting edge, cloud-based software that powers modern business. An ideal candidate is someone who enjoys working in a diverse, collaborative, geographically distributed team. Similarly, the ideal candidate is an expert engineer who values the “team”, drives continuous improvement and is unafraid to challenge the legacy status quo with creative cloud-based solutions.

Location: Plano, Texas

Responsibilities:

  • Should be strong SRE, experience with java, AWS / DevOps / deployment strategy and monitoring tools. Candidates should have hands-on experience with Dynatrace / Splunk / CICD / Grafana etc.
  • Should possess application trouble shooting experience. More on core SRE metrics before going to Prod. uptime vs availability, monitoring vs Observability, and incident and outage etc.
  • Should be familiar with SLO, SLA, SLI or other SRE keywords or terms.
  • Experience with deploying using CICD pipeline and debugging/troubleshooting issues and coordinate with the application team such as Java, Spring Boot, Python, .Net, etc.
  • Ability to perform API performance testing using tools such as JMeter / Blazemeter.
  • Experience on identifying RCA for any production issues on AWS environment with multiple microservices.
  • Expertise in Terraform to manage infrastructure as code would be highly desirable. Troubleshoot and resolve technical issues to ensure smooth operation of applications
  • Demonstrates and champions site reliability culture and practices and exerts technical influence throughout your team.
  • Leads initiatives to improve the reliability and stability of your team’s applications and platforms using data-driven analytics to improve service levels.
  • Collaborates with team members to identify comprehensive service level indicators and stakeholders to establish reasonable service level objectives and error budgets with customers.
  • Demonstrates a high level of technical expertise within one or more technical domains and proactively identifies and solves technology-related bottlenecks in your areas of expertise.
  • Acts as the main point of contact during major incidents for your application and demonstrates the skills to identify and solve issues quickly to avoid financial losses.
  • Documents and shares knowledge within your organization via internal forums and communities

Required Skill:

  • 8+ years of relevant work experience
  • Deep proficiency in reliability, scalability, performance, security, enterprise system architecture, toil reduction, and other site reliability best practices with the ability to implement these practices within an application or platform.
  • Fluency in JAVA programming.
  • Proficiency and experience in observability such as white and black box monitoring, SLO alerting, and telemetry collection using tools such as Splunk, Grafana, Dynatrace, Prometheus, Datadog.
  • Proficiency in continuous integration and continuous delivery tools (e.g., Jenkins, GitLab, Terraform, etc.)
  • Experience with container and container orchestration (e.g., ECS, Kubernetes, Docker) Preferred qualifications, capabilities, and skills.
  • Experience with infrastructure as code tools such as Terraform. also experience managing/supporting Cloud based applications, AWS preferred.
  • Excellent communications desired.

Benefits: Cognizant offers the following benefits for this position, subject to applicable eligibility requirements:

· Medical/Dental/Vision/Life Insurance

· Paid holidays plus Paid Time Off

· 401(k) plan and contributions

· Long-term/Short-term Disability

· Paid Parental Leave

· Employee Stock Purchase Plan

Disclaimer: The salary, other compensation, and benefits information is accurate as of the date of this posting. Cognizant reserves the right to modify this information at any time, subject to applicable law.



  • Plano, United States Cognizant Full time

    About Cognizant’s Digital Engineering Practice: At Cognizant Digital Engineering, a small cross functional team comprised of a Product Manager, an Architect, Full-Stack Developers, UI/UX designers and Big Data analysts builds higher quality software faster siloed individuals working independently. Small, nimble engineering teams generate collective...


  • Plano, United States Cognizant Full time

    About Cognizant’s Digital Engineering Practice: At Cognizant Digital Engineering, a small cross functional team comprised of a Product Manager, an Architect, Full-Stack Developers, UI/UX designers and Big Data analysts builds higher quality software faster siloed individuals working independently. Small, nimble engineering teams generate collective...


  • Plano, Texas, United States Capital One Full time

    Job Title: Lead Platform Engineer, Site Reliability EngineeringCapital One is seeking a highly skilled Lead Platform Engineer, Site Reliability Engineering to join our team. As a key member of our engineering organization, you will be responsible for designing, developing, and deploying scalable and reliable cloud-based systems.Key...


  • Plano, Texas, United States Pizza Hut Full time

    We're on a mission to build the most loved global brand and the fastest growing in every country. To achieve this, we need a talented Site Reliability Engineer II to join our dynamic Pizza Hut Incident Management team.As a Site Reliability Engineer II, you will establish frameworks, best practices, and scope management as we transition Incident Management...


  • Plano, Texas, United States Toyota North America Full time

    About the RoleWe are seeking a highly skilled and experienced Director of Site Reliability Engineering to lead our new SRE team at Toyota North America. As a key member of our organization, you will be responsible for building and managing a high-performing team that ensures the reliability, performance, and scalability of our systems and applications.Key...


  • Plano, United States Headway Tek Inc Full time

    Role: Site Reliability EngineerLocation: Plano, TXJob Type: Fulltime* must be solid with SW development. Fluent with Python and solid experience with Docker, KubernetesWhat you will be doingSr Site Reliability Engineer with expertise in AWS Cloud Engineering, 5G RAN Engineering, Network Design and Engineering, 5G Core Engineering. As an integral part of the...


  • plano, United States Headway Tek Inc Full time

    Role: Site Reliability EngineerLocation: Plano, TXJob Type: Fulltime* must be solid with SW development. Fluent with Python and solid experience with Docker, KubernetesWhat you will be doingSr Site Reliability Engineer with expertise in AWS Cloud Engineering, 5G RAN Engineering, Network Design and Engineering, 5G Core Engineering. As an integral part of the...


  • Plano, Texas, United States AT&T Full time

    Job SummaryWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at AT&T. As a key member of our Consumer Technology experience team, you will be responsible for delivering innovative and reliable technology solutions to power differentiated, simplified customer experiences.The ideal candidate will have a strong background in...


  • Plano, Texas, United States MSRCOSMOS Full time

    Job DescriptionMSRCOSMOS is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our Site Reliability and Observability Engineering team, you will be responsible for ensuring the reliability and performance of our network and applications.Key Responsibilities:Design and implement automation solutions to improve...


  • Plano, Texas, United States Toyota Full time

    Job SummaryWe are seeking a highly skilled Director of Site Reliability Engineering to lead our new SRE team at Toyota Financial Services. As a key member of our organization, you will be responsible for building and managing a team of engineers to ensure the reliability, performance, and scalability of our systems and applications.Key...


  • Plano, Texas, United States Toyota Full time

    About ToyotaToyota is a world-renowned brand that is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve.Job SummaryWe are seeking a highly skilled and experienced Director of Site Reliability Engineering to spearhead our new SRE team. As a key member of our team, you will...


  • Plano, Texas, United States Toyota North America Full time

    About the RoleWe are seeking a highly experienced Site Reliability Engineering Director to lead our new SRE team at Toyota North America. As a key member of our organization, you will be responsible for building and managing a high-performing team that ensures the reliability, performance, and scalability of our systems and applications.Key...


  • Plano, Texas, United States Toyota Motor Sales, U.S.A., Inc. Full time

    Job DescriptionToyota Financial Services is seeking a Director of Site Reliability Engineering to spearhead the launch of a new SRE team. The successful candidate will be responsible for building the team from the ground up and establishing robust processes to ensure the reliability, performance, and scalability of our systems and applications.Key...

  • Platform Engineer

    4 weeks ago


    Plano, Texas, United States Capital One Full time

    Job Title: Platform Engineer - Site Reliability EngineeringCapital One is seeking a highly skilled Platform Engineer to join our Site Reliability Engineering (SRE) team. As a Platform Engineer, you will be responsible for designing, developing, and deploying scalable and reliable cloud-based systems.Key Responsibilities:Collaborate with product owners to...


  • Plano, United States Capital One Full time

    Center 3 (19075), United States of America, McLean, VirginiaLead Platform Engineer, Site Reliability Engineering (SRE)Do you love building and pioneering in the technology space? Do you enjoy solving complex technical problems in a fast-paced, collaborative, inclusive, and iterative delivery environment? At Capital One, you'll be part of a big group of...


  • Plano, Texas, United States Bank of America Full time

    Senior Site Reliability EngineerAt Bank of America, we are committed to delivering exceptional customer experiences through the power of technology. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the stability and performance of our cloud-based identity systems.Key Responsibilities:Collaborate with cross-functional teams to...


  • Plano, Texas, United States Capital One Full time

    Job SummaryWe are seeking a highly skilled Senior Platform Engineer, Site Reliability Engineering to join our team at Capital One. As a key member of our engineering community, you will play a critical role in designing, developing, testing, and implementing technical solutions using a full-stack of development tools and technologies.Key Responsibilities*...


  • Plano, Texas, United States Capital One Full time

    About the Role:Capital One is seeking a skilled Platform Engineer to join our Site Reliability Engineering team. As a Platform Engineer, you will be responsible for designing, developing, and implementing technical solutions to ensure the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Work with product owners...


  • plano, United States PlektonLabs Full time

    DIRECT HIRE ONLY (NO C2C)Company DescriptionPlektonLabs enables businesses to future-proof their systems by providing customized and creative solutions, transforming enterprise architecture. With a dedicated team of tech veterans, we help organizations conceptualize and realize their plans, effortlessly navigating through the industry. No project is too...


  • Plano, United States PlektonLabs Full time

    DIRECT HIRE ONLY (NO C2C)Company DescriptionPlektonLabs enables businesses to future-proof their systems by providing customized and creative solutions, transforming enterprise architecture. With a dedicated team of tech veterans, we help organizations conceptualize and realize their plans, effortlessly navigating through the industry. No project is too...