Current jobs related to Site Reliability Engineering Manager - San Jose, California - NetApp


  • San Jose, California, United States Diverse Lynx Full time

    Job Title: Site Reliability EngineerJob Summary: We are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our data pipelines.Key Responsibilities:* Debugging data pipelines* Monitoring alerts and troubleshooting...


  • San Jose, California, United States X (formerly Twitter) Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our Command Center Team at X (formerly Twitter). As a Site Reliability Engineer, you will play a critical role in ensuring the high availability and reliability of our services, working closely with cross-functional teams to drive significant impact across all areas of the...


  • San Jose, California, United States Adobe Full time

    Job Title: Site Reliability EngineerAbout the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Adobe. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud services. You will work closely with our development team to design, deploy, and optimize our cloud services,...


  • San Jose, California, United States Diverse Lynx Full time

    Job Title: Site Reliability EngineerJob Summary:Diverse Lynx LLC is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement automation scripts using shell,...


  • San Jose, California, United States NetApp Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at NetApp. As a Site Reliability Engineer, you will be responsible for managing, supporting, and maintaining a reliable environment for our site to ensure the stability and security of multiple open-source systems/platforms.Key ResponsibilitiesBuilding and supporting a...


  • San Jose, California, United States ApTask Full time

    About ApTask:ApTask is a leading global provider of workforce solutions and talent acquisition services, dedicated to shaping the future of work.As an African American-owned and Veteran-certified company, ApTask offers a comprehensive suite of services, including staffing and recruitment solutions, managed services, IT consulting, and project management.With...


  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly...


  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerAt Syntricate Technologies, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly...


  • San Jose, California, United States Adobe Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Adobe. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based services.ResponsibilitiesDesign and implement scalable and reliable systems to support our cloud-based servicesCollaborate...


  • San Jose, California, United States Trianz Full time

    About TrianzTrianz is a leading-edge technology platforms and services company that accelerates digital transformations at Fortune 100 and emerging companies worldwide in data & analytics, digital experiences, cloud infrastructure, and security.Our VisionWe believe that companies around the world face three challenges in their digital transformation journeys...


  • San Jose, California, United States VDart Full time

    Job Title:Lead Site Reliability EngineerJob Summary:Vdart is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and availability of our software systems.Key Responsibilities:Design and implement automation scripts to improve operational...


  • San Jose, California, United States Adobe Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Adobe. As a key member of our Cloud Engineering team, you will play a critical role in designing, deploying, and optimizing our cloud services.Key ResponsibilitiesDevelop software and tools to improve the reliability and performance of our cloud servicesCollaborate...


  • San Jose, California, United States Akraya Full time

    Job Summary:We are seeking a skilled Site Reliability Engineer to join our team. The ideal candidate will have expertise in system monitoring, infrastructure management, and automation, with a keen interest in enhancing system reliability.Key Responsibilities:System Monitoring and Incident Response: Ensure system health and responsiveness to incidents with...


  • San Jose, California, United States Triune Infomatics Inc Full time

    Role:Senior Site Reliability ManagerTriune Infomatics Inc is seeking an experienced Senior Site Reliability Manager to join our team and contribute to the design and upkeep of our cloud-based IoT edge orchestration solution.Job Summary:The Senior Site Reliability Manager will be responsible for ensuring the availability of our SaaS platform and meeting the...


  • San Jose, California, United States YO HR CONSULTANCY Full time

    Job Title: Site Reliability EngineerJob Type: Full-timeLocation: RTP/NC and San Jose CAJob Description:Must-Have Skills:Strong knowledge of Kubernetes and LinuxExperience with container orchestration frameworksGood understanding of distributed computing and storageProficiency in scripting languages such as Python and ShellKnowledge of Jenkins and...


  • San Jose, California, United States F5 Full time

    Job SummaryF5 is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you will play a pivotal role in ensuring the reliability and scalability of our distributed cloud product.Key ResponsibilitiesDesign and implement automation solutions to reduce toil and improve operational efficiencyParticipate in...


  • San Jose, California, United States Adobe Full time

    About the RoleWe are seeking an exceptional Site Reliability Engineering Manager to lead our team in driving reliability for Adobe's AI Inference Platform, Adobe Firefly. As a key member of our Engineering organization, you will be responsible for developing a team of Site Reliability Engineers who will work closely with our Engineering teams to build,...


  • San Jose, California, United States Tik Tok Full time

    Transforming Data Infrastructure with TikTokTikTok is a pioneer in innovation, merging software development and infrastructure operations to design, build, and manage large-scale, highly distributed systems. Our Site Reliability Engineering (SRE) team is a key player in this journey, overseeing one of the industry's most extensive cloud...


  • San Jose, California, United States Tik Tok Full time

    About UsTikTok is a global leader in short-form mobile video, inspiring creativity and bringing joy to users worldwide. Our mission is to empower creators and communities to thrive in a vibrant, inclusive space.Job SummaryWe're seeking a skilled Site Reliability Engineer to join our dynamic team, driving innovation and excellence in our cloud infrastructure....


  • San Jose, California, United States Tik Tok Full time

    Job Title: Site Reliability Engineer, Cloud Native PlatformTikTok is seeking a highly skilled Site Reliability Engineer to join our Cloud Native Platform team. As a key member of our infrastructure team, you will be responsible for building and maintaining our globally distributed edge platform.Key Responsibilities:Deploy and administer Kubernetes clusters...

Site Reliability Engineering Manager

2 months ago


San Jose, California, United States NetApp Full time
About NetApp

NetApp is a leader in data infrastructure, empowering customers to turn challenges into opportunities. Our innovative approach combines fresh thinking with proven expertise to help customers unlock the full potential of their data.

We're a company that values diversity, openness, and collaboration. Our employees are passionate about solving complex problems and driving business success. If you're a motivated individual who thrives in a fast-paced environment, we want to hear from you.

Job Summary

The Site Reliability Engineering Manager will lead a high-performing team responsible for ensuring the reliability, performance, and efficiency of our critical systems. This role requires a unique blend of engineering and operations expertise, with a strong background in software development, systems engineering, and leadership.

Key Responsibilities
  • Lead and mentor a team of SREs, fostering a culture of continuous improvement and innovation.
  • Collaborate with product and engineering teams to design and implement scalable solutions.
  • Develop and maintain a reliable monitoring and alerting system to detect and mitigate issues proactively.
  • Drive incident management processes and conduct post-mortem analyses to prevent future outages.
  • Manage priorities, projects, and the overall workflow of the SRE team.
  • Ensure compliance with security best practices and company policies.
  • Stay ahead of industry trends and emerging technologies to continuously improve system reliability and performance.
Requirements
  • Minimum of 8 years of experience in SRE, DevOps, or similar roles, with at least 2+ years in a leadership position with direct reports.
  • Experience leading geographically dispersed teams.
  • Proficiency in programming languages such as Python, Go, or Java.
  • Extensive experience with cloud services (AWS, GCP, Azure) and container orchestration tools (Kubernetes, Docker).
  • Solid understanding of CI/CD pipelines and automation tools (Jenkins, Ansible, Terraform).
  • Exceptional knowledge of observability tools and setting up architecture for proactive monitoring of the product.
  • Proven track record of designing and implementing scalable, high-availability systems.
  • Exceptional problem-solving skills and the ability to work under pressure.
  • Excellent communication and team-building skills.
Education

Bachelor's degree in computer science, engineering, or a related field; Master's preferred.

Compensation
The base salary range for this position is [$180,200]–[$250,300] and will be determined by the candidate's location, qualifications, experience, and education. Final compensation packages are competitive and in line with industry standards, reflecting a variety of factors, and include a comprehensive benefits package. This may cover Health Insurance, Life Insurance, Retirement or Pension Plans, Paid Time Off (PTO), various Leave options, Performance-Based Incentives, employee stock purchase plan, and/or restricted stocks (RSU's), with all offerings subject to regional variations and governed by local laws, regulations, and company policies. Benefits may vary by country and region, and further details will be provided as part of the recruitment process.

Equal Opportunity Employer:

NetApp is firmly committed to Equal Employment Opportunity (EEO) and to compliance with all federal, state and local laws that prohibit employment discrimination based on age, race, color, gender, sexual orientation, gender identity, national origin, religion, disability or genetic information, pregnancy, protected veteran status, and any other protected classification.

Why NetApp?

We're a company that values innovation, collaboration, and customer success. Our employees are passionate about solving complex problems and driving business growth. If you're a motivated individual who thrives in a fast-paced environment, we want to hear from you.

We offer a comprehensive benefits package, including Health Insurance, Life Insurance, Retirement or Pension Plans, Paid Time Off (PTO), various Leave options, Performance-Based Incentives, employee stock purchase plan, and/or restricted stocks (RSU's). Benefits may vary by country and region, and further details will be provided as part of the recruitment process.

NetApp is an Equal Opportunity Employer:

We're committed to diversity, equity, and inclusion. We believe that a diverse and inclusive workplace is essential to our success and to creating a culture of belonging. We welcome applications from qualified candidates from all backgrounds and perspectives.