Reliability Engineering Manager

20 hours ago


Santa Clara, California, United States Pure Storage, Inc. Full time

Drive product reliability at scale as a Software Engineering Manager for Pure Storage's Fleet Reliability Engineering team. Lead a group dedicated to ensuring the highest reliability of customer FlashBlade systems.

Your mission will be to manage and improve systems and processes that monitor and respond to fleet reliability issues—whether through reactive, proactive, or predictive measures.

Main Responsibilities:
  • Lead a team of forensic engineers focused on improving the reliability, performance, and resilience of Pure's products.
  • Own and drive initiatives to improve fleet reliability by fixing critical bugs, enhancing test automation, and developing new features and processes.
  • Define the roadmap for internal tools development, focusing on predictive monitoring, health checks, performance troubleshooting, and failure investigation tools.
  • Coordinate and approve maintenance releases across all software lines, ensuring engineering teams contribute effectively while maintaining high-quality standards.
  • Collaborate with hardware, software, support, and customer-facing teams to triage customer issues, assign ownership, and set expectations for resolution.
  • Manage the team's priorities, career development, and growth while fostering a culture of innovation and customer focus.
  • Oversee product reliability metrics to measure success and adjust priorities to ensure continuous improvement.
  • Own the global software engineering on-call program, coordinating efforts across multiple teams and ensuring its success when support escalates critical issues.
  • Report key product and customer issue updates to executive leadership.
Requirements:
  • Bachelor's degree in Computer Science, Engineering, related field, or equivalent practical experience.
  • 8+ years of experience in software engineering with a focus on systems, storage, networking, or reliability.
  • 4+ years of experience in a leadership or management role within software engineering.
  • Proven experience leading teams through complex technical challenges.
  • Excellent project management and organizational skills, with a keen ability to prioritize and adapt under pressure.
  • Strong interpersonal communication skills, capable of interacting effectively with both technical and non-technical stakeholders.
Compensation and Benefits:
  • The annual base salary range is: $207,000 - $312,000.


  • Santa Clara, California, United States NVIDIA Full time

    As a Senior Manager in Site Reliability Engineering (SRE) at NVIDIA, you will lead a team dedicated to the design, construction, and maintenance of expansive production systems, emphasizing high efficiency and availability. This role spans various domains, including software and systems engineering, cloud-scale storage, data management, and services. SRE...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the JobPalo Alto Networks is seeking an experienced Reliability Engineer to join our team. The ideal candidate will have a strong background in reliability engineering and networking products.The successful candidate will be responsible for establishing controls and document procedures related to NPI product quality and reliability, aiding Development...


  • Santa Clara, California, United States OmniVision Technologies Full time

    OmniVision Technologies, a leading CMOS Image Sensor Manufacturer, is seeking a highly skilled Staff Reliability Engineer to join its team in Santa Clara, CA.About the RoleWe are looking for an exceptional engineer with expertise in reliability systems to help us design and develop high-quality image sensors. As a Staff Reliability Engineer, you will play a...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking an experienced Principal Site Reliability Engineer to join our Cloud Infrastructure team. As a key member of our team, you will be responsible for designing, building, and maintaining scalable and reliable cloud infrastructure to support our mission-critical applications.Key ResponsibilitiesDesign and implement...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About Palo Alto NetworksOur mission is to protect our digital way of life. We strive to be the cybersecurity partner of choice, working tirelessly to safeguard our customers' public cloud workloads with resilient, scalable, and always-on firewall solutions.We're a team of innovators who challenge the status quo and drive meaningful change in the...


  • Santa Clara, California, United States OmniVision Technologies Full time

    About OmniVision TechnologiesWe are a leading manufacturer of CMOS Image Sensors based in Santa Clara, CA.Job SummaryWe are seeking a highly skilled Reliability Systems Expert to join our team. In this role, you will be responsible for ensuring the high-quality and reliability of our image sensors.ResponsibilitiesDevelop and implement reliability testing...


  • Santa Clara, California, United States Cryptoware Technologies Inc Full time

    Job OverviewCryptoware Technologies Inc is seeking a highly skilled and experienced Site Reliability Engineer to lead the effort of global expansion of our globe-spanning infrastructure.


  • Santa Clara, California, United States Forward Networks Inc Full time

    Forward Networks Inc is revolutionizing network management with its cutting-edge Forward Enterprise platform. This innovative technology delivers a digital twin of the network, based on a mathematical model.The platform scales to support hundreds of thousands of devices, whether cloud, hybrid cloud, or on-premises. It serves as a single source of truth for...


  • Santa Barbara, California, United States Invoca Full time

    **Company Overview:**    Invoca is a leading AI and machine learning-powered Conversation Intelligence company, with over 300 employees and 2,000+ customers. The company has achieved significant growth, reaching $100M in revenue and raising over $184M from top venture capitalists.About the Role:The Senior Site Reliability Engineer will be part of the...


  • Santa Clara, California, United States ZipRecruiter Full time

    Company OverviewPalo Alto Networks is a leading cybersecurity company that protects the digital way of life. Our mission is to be the cybersecurity partner of choice, and we're committed to shaping the future of cybersecurity.SalaryThe estimated annual salary for this position is $217,750, based on industry standards and location. The compensation package...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking an experienced and highly motivated Senior Director to lead our hardware quality and compliance engineering team.This role will be instrumental in ensuring our hardware products meet stringent quality standards, regulatory requirements, and customer expectations.The ideal candidate must be self-motivated and...


  • Santa Clara, California, United States Apple Full time

    Job DescriptionCompany: AppleJob Title: Software Engineering ManagerDepartment: Software DeliveryLocation: Santa Clara, California, United StatesWe are seeking a highly skilled Software Engineering Manager to lead the Device Services Engineering team within Software Delivery at Apple. The successful candidate will be responsible for supervising and mentoring...

  • Software Engineer

    1 week ago


    Santa Clara, California, United States Forward Networks Inc Full time

    Forward Networks Inc is a pioneering company that revolutionizes the way large networks are managed. Their advanced software delivers a digital twin of the network, enabling network operators to verify intent, predict network behavior, avoid outages, and simplify network management.This innovative platform can be implemented on premises, in the cloud, or in...


  • Santa Clara, California, United States Roche Holdings Inc. Full time

    At Roche Holdings Inc., we are committed to fostering diversity, equity and inclusion, representing the communities we serve. When dealing with healthcare on a global scale, diversity is an essential ingredient to success. We believe that inclusion is key to understanding people's varied healthcare needs.A healthier future is what drives us to innovate. To...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Palo Alto Networks is a leading cybersecurity company that prioritizes quality and reliability in its hardware products.We are seeking an experienced and highly motivated Sr Director to lead our hardware quality and compliance engineering team. This role will be instrumental in ensuring our hardware products meet stringent quality standards, regulatory...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About UsPalo Alto Networks is a pioneering cybersecurity company dedicated to protecting the digital way of life. Our mission is to be the partner of choice for organizations seeking robust cybersecurity solutions.We foster a culture of innovation, collaboration, and continuous improvement, where employees feel empowered to contribute their unique ideas and...


  • Santa Clara, California, United States Roche Holdings Inc. Full time

    Job OverviewA Principal DevOps Engineer is sought to lead the QCS Algorithms deployments in Roche's Digital Pathology Algorithm DevOps team. Collaborating with developers, product owners, release train engineers, and other DevOps team members, this role involves capacity planning, high availability engineering, performance tuning, and automation/tools...


  • Santa Clara, California, United States Forward Networks Inc Full time

    Job OverviewWe are seeking a seasoned Engineering Manager to lead our Platforms team across two geographic regions. As a hands-on leader, you will oversee the development and operations of a scalable, reliable, and high-performing platform.


  • Santa Clara, California, United States Inflection Full time

    Unlock the Future of Cryptographic ComputingAbout Fabric CryptographyFabric Cryptography is a pioneering company that's revolutionizing trust and privacy with cutting-edge cryptography technology. As a fast-growing Series A deep tech firm, we're committed to unlocking the next generation of cryptography.The RoleWe're seeking an accomplished System Leader to...


  • Santa Clara, California, United States Amazon Full time

    About the RoleWe are seeking a highly skilled Software Development Engineer to join our Usability & Interfaces team in AWS HealthOmics. In this role, you will be responsible for developing capabilities that champion how customers interact with our service, including API development, workflow and performance optimizations, and interfacing with...