Site Reliability Engineer

5 days ago


San Jose, California, United States Western Digital Full time
Job Description

Western Digital is seeking a highly skilled Site Reliability Engineer - DevOps to join our team. As a key member of our engineering process, you will be responsible for delivering software development tools and infrastructure that empower our engineering teams to develop and deliver high-quality products quickly.

You will play a pivotal role in ensuring the reliability, scalability, and performance of our IT infrastructure and DevOps tools. Your technical expertise, adaptability, and commitment to excellence will drive the success of our stakeholders and enable them to develop and deliver high-quality products faster, without sacrificing security, development velocity, stability, code quality, or code health.

Key Responsibilities
  • Design, implement, and continuously improve monitoring and observability solutions to ensure effective and real-time visibility into system performance.
  • Advocate for and implement best practices in SRE, DevOps, and automation, with a focus on enhancing platform stability and performance.
  • Lead automation efforts to streamline processes, reduce manual tasks, and improve operational efficiency.
  • Contribute to the architecture and design of systems and applications, aligning them with reliability and scalability goals.
  • Provide technical ownership in the SRE team, fostering a collaborative and growth-oriented environment.
  • Take ownership of system reliability, meet Service Level Objectives (SLOs), and ensure customer satisfaction.
  • Work closely with Engineering teams to understand customer requirements and collaborate on solutions.
  • Stay updated with emerging technologies and adapt quickly to evolving requirements and challenges.
  • Continuously upskill in newer technologies and share knowledge within the team.
  • Collaborate effectively with team members and contribute to a positive team culture.
  • Maintain thorough and well-organized documentation of systems and processes.
Qualifications

Candidates must possess a B.S. in C.S, I.T., E.E., or M.E., + 6 to 10 years of hands-on experience in DevOps tools and SRE practices. Administration experience on DevOps tools such as Artifactory, Jenkins, Git, Blackduck, SAST/DAST tools, etc. is required. A very good understanding of Infrastructure at the Server, VMWare, Storage, and Networking is also necessary. Exceptional analytical, problem-solving, and troubleshooting skills to manage complex process and technology issues are essential. Extensive experience in Ansible automation, shell scripting, Python, and other configuration management tools like Terraform is required. Development and customization of CICD pipelines and onboarding applications with varying requirements are also necessary. Experience in monitoring enhancements and metrics dashboarding using tools such as Icinga, Splunk, Prometheus & Grafana is required. Good to have experience in containerization technologies viz., Docker, Kubernetes. Automation First mindset. Focus on embedding Security postures on the systems. Working experience in ha-proxy, load balancers, ldap/sso integration, security endpoint configurations is required. Knowledge of cloud computing platforms (e.g., AWS, Azure, GCP) is a plus. Excellent communication and collaboration skills are necessary.



  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement automation scripts using...


  • San Jose, California, United States Diverse Lynx Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement automation scripts using shell,...


  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly...


  • San Jose, California, United States Altius Technologies, Inc. Full time

    Job Title: Site Reliability EngineerAltius Technologies, Inc. is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the infrastructure and systems that support our business applications.Key Responsibilities:Design and implement automation...


  • San Jose, California, United States ApTask Full time

    About ApTask:ApTask is a leading global provider of workforce solutions and talent acquisition services, dedicated to shaping the future of work.As an African American-owned and Veteran-certified company, ApTask offers a comprehensive suite of services, including staffing and recruitment solutions, managed services, IT consulting, and project management.With...


  • San Jose, California, United States Adobe Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Adobe. As a key member of our Cloud Engineering team, you will play a critical role in designing, deploying, and optimizing our cloud services.Key ResponsibilitiesDevelop software and tools to improve the reliability and performance of our cloud servicesCollaborate...


  • San Jose, California, United States Cisco Full time

    About the RoleCisco is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure. You will work closely with our development teams to identify and resolve issues, and collaborate with other teams to...


  • San Jose, California, United States Splunk Full time

    About SplunkSplunk is a leading provider of cloud-based data analytics and monitoring solutions. Our mission is to make machine data accessible, usable, and valuable to everyone.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our Cloud TechOps team. As a Site Reliability Engineer, you will be responsible for ensuring the...


  • San Jose, California, United States Adobe Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Adobe. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of our cloud-based services.ResponsibilitiesEnsure the highest level of uptime and Quality of Service (QoS) to Adobe's customers through...


  • San Jose, California, United States Adobe Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Adobe. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, availability, and performance of our cloud-based services.Key ResponsibilitiesEnsure the highest level of uptime and Quality of Service (QoS) to Adobe's customers through...


  • San Jose, California, United States Trianz Full time

    About TrianzTrianz is a leading-edge technology platforms and services company that accelerates digital transformations at Fortune 100 and emerging companies worldwide in data & analytics, digital experiences, cloud infrastructure, and security.Our VisionWe believe that companies around the world face three challenges in their digital transformation journeys...


  • San Jose, California, United States Tik Tok Full time

    {"title": "Site Reliability Engineer", "description": "\u003Cp\u003EAt TikTok, we're seeking Site Reliability Engineers (SREs) to join our monetization technology team.\u003C/p\u003E\u003Cp\u003EOur team works on building and running large-scale, globally distributed, fault-tolerant ads systems.\u003C/p\u003E\u003Cp\u003ESREs keep the systems up and running...


  • San Jose, California, United States Adobe Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Adobe. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud services.Key ResponsibilitiesDevelop software and tools to design, deploy, and optimize cloud servicesProvide hands-on technical...


  • San Jose, California, United States Altius Technologies Inc Full time

    Job DescriptionAt Altius Technologies Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for creating and supporting automation scripts for infrastructure deployments, validations, and monitoring to improve operational tasks.Key Responsibilities:Design and implement...


  • San Jose, California, United States Adobe Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Adobe. As a key member of our Cloud Engineering team, you will play a critical role in designing, deploying, and optimizing our cloud services.Key ResponsibilitiesDevelop software and tools to improve the reliability and performance of our cloud servicesProvide...


  • San Jose, California, United States ByteDance Full time

    About the RoleByteDance is seeking a highly skilled Site Reliability Engineer to join our Applied Machine Learning team. As a Site Reliability Engineer, you will play a critical role in ensuring the availability and performance of our machine learning services, which are used by hundreds of millions of people around the world.ResponsibilitiesDesign and...


  • San Jose, California, United States NetApp Full time

    Job SummaryAs a Site Reliability Engineer at NetApp, you will be responsible for managing, supporting, and maintaining a reliable environment for our site. This involves ensuring the stability and security of multiple open-source systems and platforms that are run or operated in that environment.Key ResponsibilitiesBuilding and supporting a reliable site for...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Cloud Site Reliability EngineerWe are seeking a highly skilled Cloud Site Reliability Engineer to join our team at TikTok. As a Cloud Site Reliability Engineer, you will be responsible for building, expanding, and operating Bytedance's global infrastructures, including large-scale systems in public and private clouds, data centers, and content...


  • San Jose, California, United States ByteDance Full time

    About the Role:ByteDance is seeking a highly skilled Site Reliability Engineer to join our Applied Machine Learning team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and performance of our machine learning services.Responsibilities:Design and implement large-scale systems to support machine learning...


  • San Jose, California, United States Adobe Full time

    Job Title: Site Reliability EngineerAt Adobe, we're looking for a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based services.Key Responsibilities:Design, develop, and deploy cloud-based services and...