Site Reliability Engineer

3 weeks ago


San Leandro, California, United States NTT DATA Services Full time

Req ID:

NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.

We are currently seeking a Site Reliability Engineer (FTE / Hybrid) to join our team in San Leandro, California (US-CA), United States (US).

Job Duties and Responsibilities:

  • 10+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 10+ years of experience in Production support/Site Reliability Engineering teams with continued focus on improving Platform health
  • Familiar with Agile or other rapid application development practices
  • Hands-on expertise with Automated testing, Process Automation & building dashboards using APM tools.
  • Experience with distributed (multi-tiered) systems, algorithms, relational databases, and NoSQL databases.
  • Knowledge & Exposure caching tools (Redis, memcache) or messaging tools such as MQ, Kafka.
  • Must have working knowledge of APM tools such as splunk, GCL, ELK, Grafana, Prometheus etc.
  • Able to create Dashboards using GCL/Splunk/ELK and setup alerts.
  • Working knowledge of CICD is a plus – Source control like Git, Continuous Integration – Jenkins / UCD Release etc. .
  • Ability to work with Engineering teams across the ecosystem such as Security, Networking & Infrastructure challenges which can impact platform health & resiliency.
  • Shell Scripting / DevOps tools like Ansible with good knowledge of yaml file to write playbooks.
  • Experience with distributed storage technologies like NFS as well as dynamic resource management frameworks PCF, Kubernetes / OpenShift, AWS or Azure.
  • Tech Stack: Java/J2EE (Spring, Spring Boot, Python, Shell Scripting, Kafka, Oracle, MongoDB etc.).
  • Able to work on shift duty in a 12/7 support organization.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
  • Bachelor's degree in computer science, computer science engineering, or related experience required.
  • Job Expectations:
  • You will be a core member of a SRE support team, will be utilizing the latest technology tools to write code, test cases, working with API specs and automate to maintain the resiliency, performance and availability of Digital Sales & Marketing platforms.
  • Strong & relevant experience in supporting Web/API platforms built using Java/java script Stack (Spring/Spring boot, Javascript -Angular/react)
  • Proficiency in dealing with Legacy infrastructure along with cloud infrastructure (on prem & 3rd party) such as PCF or Azure.
  • Identifying opportunities to adopt to new technologies while improving the efficiency by removing toil and continues to drive efficiency & optimization.
  • Proactive monitoring of app performance through Splunk, App dashboards, App dynamics & Dynatrace etc.
  • Represent Platform engineering teams during production outages and collaborate with engineering teams to resolve production outages. Collaborate with stake holders across engineering function to own/derive RCA & work towards permanent resolution.
  • Plan, support, execute and comply with governance programs/processes in support of a strong control environment in your functional area. Leverage process documentation to improve operational controls and identify and remediate process deficiencies.
  • Proactively identify, communicate, mitigate and escalate risk originating from non-compliance of processes, operational errors, and data integrity issues in all applicable processes.
  • Ability to influence SRE practices within and outside teams to enable a strong DevOps culture within the organization
  • Able to work on shift duty in a 12/7 support organization.
  • Responsible for working with Engineering teams to maintain the SLAs & SLOs. Constantly looking out for opportunities to improve platform metrics & communicate the same to stakeholders.
  • Exposure and proficiency in different API styles such as SOAP, REST, Micro services etc.
  • Working knowledge of Unix, Linux and Postman
  • Willingness to work on-site at stated location on the job opening (This position offers a hybrid work schedule)

Basic Qualifications:

  • 5+ years of experience in Java Integration Development (Java 8, Camel, Spring Boot, Spring Framework, Microservices, Rest APIÂ's)

#INDFSINS

#INDAPPS

About NTT DATA

NTT DATA is a $30 billion trusted global innovator of business and technology services. We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long term success. As a Global Top Employer, we have diverse experts in more than 50 countries and a robust partner ecosystem of established and start-up companies. Our services include business and technology consulting, data and artificial intelligence, industry solutions, as well as the development, implementation and management of applications, infrastructure and connectivity. We are one of the leading providers of digital and AI infrastructure in the world. NTT DATA is a part of NTT Group, which invests over $3.6 billion each year in R&D to help organizations and society move confidently and sustainably into the digital future. Visit us at

NTT DATA is an equal opportunity employer and considers all applicants without regarding to race, color, religion, citizenship, national origin, ancestry, age, sex, sexual orientation, gender identity, genetic information, physical or mental disability, veteran or marital status, or any other characteristic protected by law. We are committed to creating a diverse and inclusive environment for all employees. If you need assistance or an accommodation due to a disability, please inform your recruiter so that we may connect you with the appropriate team.



  • San Leandro, California, United States VDart Inc Full time

    Job OverviewPosition: Site Reliability EngineerCompany: VDart IncRole Summary:We are seeking a skilled Site Reliability Engineer with a strong background in Java to enhance our platform's performance and reliability. The ideal candidate will have a proven track record in production support and a commitment to optimizing system health.Key...


  • San Jose, California, United States Adobe Full time

    Site Reliability Engineer page is loadedAdobe's Reliability Engineering team is looking for a Site Reliability Engineer (SRE) to help build and operate services like Adobe Sign. Adobe Sign is the fastest, and easiest way to get contracts signed and filed.You have a track record as a site reliability engineer in large-scale SaaS businesses, and a strong...


  • San Mateo, California, United States 2K Full time

    Who We AreFounded in 2005, 2K Games is a global video game company, publishing titles developed by some of the most influential game development studios in the world. Our studios responsible for developing 2K's portfolio of world-class games across multiple platforms, include Visual Concepts, Firaxis, Hangar 13, CatDaddy, Cloud Chamber, and HB Studios. Our...


  • San Jose, California, United States Zscaler Full time

    About ZscalerAt Zscaler, our Engineering team has developed the largest cloud security platform globally, and we continue to innovate. With over 100 patents and ambitious plans for service enhancement and global expansion, our team has established us as a leader in cloud security, serving more than 15 million users across 185 countries. We invite you to...


  • San Jose, California, United States Zscaler Full time

    About ZscalerAt Zscaler, our Engineering team has developed the largest cloud security platform globally, and we continue to innovate. With over 100 patents and ambitious plans for service enhancement and global expansion, our team has established us as the leader in cloud security, serving more than 15 million users across 185 countries. We invite you to...


  • San Jose, California, United States Zscaler Full time

    About UsZscaler has developed the world's largest cloud security platform, continually innovating and expanding our services. With a robust portfolio of over 100 patents and ambitious plans for global growth, our team has established itself as a leader in cloud security, serving more than 15 million users across 185 countries. We are looking for talented...


  • San Francisco, California, United States AutoRABIT Holding Inc. Full time

    Job OverviewAbout AutoRABIT:AutoRABIT is a rapidly expanding SaaS company recognized as the premier provider of Salesforce DevSecOps solutions tailored for regulated sectors such as finance, insurance, and healthcare. Our platform empowers developers to streamline their workflows, enhancing productivity and accelerating release cycles while adhering to...


  • San Francisco, California, United States Instabase Full time

    About InstabaseInstabase is a cutting-edge technology company that specializes in democratizing access to AI innovation. Our mission is to empower organizations to solve complex unstructured data problems and unlock new business opportunities.Our TeamWe are a team of passionate and innovative professionals who are dedicated to building scalable and reliable...


  • San Francisco, California, United States AutoRABIT Holding Inc. Full time

    Job OverviewAbout AutoRABIT:AutoRABIT is a rapidly expanding SaaS provider and a prominent leader in the Salesforce DevSecOps platform tailored for regulated sectors such as finance, insurance, and healthcare. Our solutions empower developers to streamline their daily operations, enhancing productivity and accelerating release cycles while adhering to...


  • San Jose, California, United States VDart Inc Full time

    Job OverviewPosition: Lead Site Reliability EngineerLocation: San Jose, CA (Hybrid Work Model)Contract Duration: 6+ monthsExperience Required: 14+ YearsRole Summary:We are in search of a highly experienced and proactive Site Reliability Engineer Consultant. In this pivotal role, you will be responsible for:Key Responsibilities:Enhancing the reliability,...


  • San Jose, California, United States VDart Inc Full time

    Job OverviewPosition: Lead Site Reliability EngineerLocation: San Jose, CA (Hybrid Work Model)Contract Duration: 6+ monthsExperience Required: 14+ YearsRole Summary:We are in search of a highly experienced and proactive Site Reliability Engineer Consultant. In this capacity, you will be responsible for:Key Responsibilities:Enhancing the reliability,...


  • San Francisco, California, United States AutoRABIT Holding Inc. Full time

    Job OverviewAbout AutoRABIT:AutoRABIT is a rapidly expanding SaaS company recognized as the premier provider of Salesforce DevSecOps solutions tailored for regulated sectors such as finance, insurance, and healthcare. Our offerings empower developers to streamline their daily operations, enhancing productivity and accelerating release cycles while adhering...


  • San Diego, California, United States Onebrief, Inc Full time

    About Onebrief, Inc.Onebrief, Inc. is a cutting-edge technology company that specializes in developing innovative solutions for military planning and operations. Our flagship product, Onebrief, is an all-in-one tool that streamlines the planning process, enabling users to create and manage complex plans with ease.Job SummaryWe are seeking a highly skilled...


  • San Francisco, California, United States Orb Full time

    About OrbOrb is a pioneering company that provides cutting-edge infrastructure solutions to businesses, empowering them to unlock their revenue potential. Our mission is to revolutionize the way companies approach billing and invoicing, making it a seamless and efficient process.Role & ImpactAs a Site Reliability Engineer at Orb, you will play a critical...


  • San Francisco, California, United States Orb Full time

    About OrbOrb is a pioneering company that provides cutting-edge infrastructure solutions to businesses, empowering them to unlock their revenue potential. Our mission is to revolutionize the way companies approach billing and invoicing, making it a seamless and efficient process.Role & ImpactAs a Site Reliability Engineer at Orb, you will play a critical...


  • San Diego, California, United States Dexcom Full time

    About Dexcom:Founded in 1999, Dexcom, Inc. (NASDAQ: DXCM) is a pioneer in the development and marketing of Continuous Glucose Monitoring (CGM) systems designed for use by individuals with diabetes and healthcare professionals. As a leader in the transformation of diabetes management, Dexcom is committed to providing innovative CGM technology that empowers...


  • San Diego, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Analytics team at Apple. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our data analytics applications and infrastructure.Key ResponsibilitiesDesign, develop, and maintain complex data infrastructure at the...


  • San Diego, California, United States Platform Science Full time

    About UsAt Platform Science, we are dedicated to connecting all aspects of mobility. Established in 2015, our open IoT platform collaborates with forward-thinking fleets, application developers, vehicle manufacturers, and equipment providers within the transportation sector to deliver groundbreaking solutions for supply chain professionals worldwide.Our...


  • San Diego, California, United States Platform Science Full time

    Company OverviewAt Platform Science, we are dedicated to revolutionizing connectivity in the transportation sector. Established in 2015, our open IoT platform collaborates with forward-thinking fleets, application developers, vehicle manufacturers, and equipment providers to deliver groundbreaking solutions for supply chain professionals worldwide.Our...


  • San Diego, California, United States Platform Science Full time

    About UsAt Platform Science, we are dedicated to revolutionizing the transportation industry through innovative IoT solutions. Established in 2015, our open platform collaborates with forward-thinking fleets, application developers, vehicle manufacturers, and equipment providers to enhance supply chain efficiency worldwide.Our workforce is a vibrant and...