Senior Storage Engineer

3 weeks ago


Santa Clara, California, United States NVIDIA Full time

NVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization.

The company's GPU serves as the visual cortex of modern computers and is at the heart of its products and services.

NVIDIA is looking for skilled professionals to help accelerate the next wave of artificial intelligence.

As a Senior Site Reliability Engineer focused on HPC storage, you will play a crucial role in designing, implementing, and optimizing on-prem High-Performance Computing (HPC) storage solutions while harnessing the power of cloud computing.

You will be responsible for crafting and deploying distributed storage solutions, build automation tools, and ensuring the efficient operations of our growing IT ecosystem.

You will collaborate closely with engineering teams to align infrastructure with their evolving needs, document best practices, and contribute to the success of groundbreaking projects.

Key Responsibilities:

  • Design and implement on-prem HPC infrastructure supplemented with cloud computing to support the growing IT needs of NVIDIA.
  • Design and implement scalable and efficient Storage solutions tailored for data-intensive applications, optimizing performance and cost-effectiveness.
  • Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources.
  • Document general procedures and practices, perform technology evaluations, related to distributed file systems.
  • Collaborate across teams to better understand developers' workflows and gather their infrastructure requirements.
  • Influence and guide methodologies for building, testing, and deploying applications to ensure optimal performance and resource utilization.

Requirements:

  • BS (or equivalent experience) in Computer Science with 8+ years of relevant experience, MS with 5+ years of experience or Ph.D. with 3 years of experience.
  • 8+ years of experience crafting technology solutions and resolving performance bottlenecks for HPC applications.
  • Experience with one or more parallel or distributed filesystems such as Lustre, GPFS is a must.
  • Design, deployment, and management of Enterprise NAS solutions like NetApp, Pure Storage.
  • Experience in designing and managing Large scale On-Prem Object storage clusters.
  • Python/Golang programming/scripting experience is a must.
  • Strong Experience operating services in any of the leading Cloud environment [AWS, Azure or GCP].
  • Excellent communication and collaboration skills.

Preferred Qualifications:

  • Background with RDMA (InfiniBand or RoCE) fabrics.
  • Experience with multiple monitoring stacks such as Prometheus+Grafana, Elasticsearch+Kibana, Splunk, Zabbix, etc. Familiarity with newer and emerging monitoring products.
  • Prior Experience with HPC cluster management tools such as Slurm, PBS, LSF, etc.
  • Experience with containerization technologies, such as Docker, Mesosphere DCOS, Kubernetes (k8s).

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you.

The base salary range is 164,000 USD - 310,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits.

NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.



  • Santa Clara, California, United States Pure Storage Full time

    Join Pure Storage's Cloud Storage TeamPure Storage is seeking a highly skilled Senior Product Manager to lead our public cloud storage business. As a key member of our team, you will be responsible for developing and driving our public cloud storage-as-a-service business, working closely with cross-functional teams to identify and prioritize product features...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Site Reliability Engineer - HPC StorageNVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. Our work opens up new universes to explore, enables unique creativity and discovery, and powers what were once science fiction inventions, from artificial intelligence to...

  • Mechanical Engineer

    3 weeks ago


    Santa Clara, California, United States Pure Storage Full time

    About the RolePure Storage is seeking a skilled Mechanical Engineer to drive and manage all mechanical aspects for our DirectFlash Modules (DFMs), storage devices. The ideal candidate will be detail-oriented and interested in designing in-house with minimal guidance.Key ResponsibilitiesLead all aspects of in-house drive designs on mechanical and...


  • Santa Clara, California, United States Pure Storage Full time

    Join Pure Storage as a Senior Software Engineering ManagerPure Storage is seeking a Senior Software Engineering Manager to lead the development of automated test infrastructure for our software-based all-flash storage arrays. As a key member of our engineering team, you will collaborate with software and hardware engineers to define test automation...


  • Santa Clara, California, United States Pure Storage Full time

    Join Pure Storage's Team of InnovatorsPure Storage is a leading provider of all-flash storage solutions, and we're seeking a talented Engineering Manager to help drive the development of our products. As a key member of our team, you will be responsible for leading a team of software developers in the creation of our product, collaborating with product...

  • Senior Manager

    4 weeks ago


    Santa Clara, California, United States NVIDIA Full time

    Job SummaryNVIDIA is seeking a highly experienced Senior Manager to lead our Site Reliability Engineering team in designing, constructing, and maintaining production systems. As a key member of our team, you will be responsible for overseeing the implementation of reliable storage solutions, efficient data management, and delivering associated services to...


  • Santa Clara, California, United States Pure Storage Full time

    About the Role:We are seeking a highly skilled Quality Systems Engineer to join our FlashArray team. As a Quality Systems Engineer, you will be responsible for ensuring the highest quality releases of our industry-leading storage products.Key Responsibilities:Plan, test, and qualify shipping releasesWork with Support and Escalation teams to help reproduce...


  • Santa Clara, California, United States Pure Storage Full time

    About the RolePure Storage is seeking a seasoned Engineering Manager to lead the development of automated test infrastructure for our software-based all-flash storage arrays. As a key member of our engineering team, you will collaborate with software and hardware engineers to define test automation requirements, participate in product design reviews, and...


  • Santa Clara, California, United States Pure Storage Full time

    Transform Data Platforms with UsPure Storage is driving digital transformation and innovation in the data storage industry. We're seeking a talented UX Designer to join our team and help shape the user experience of our Modern Data platforms and services.What You'll Do:Collaborate with cross-functional teams to design key features and drive our product's...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior AI-HPC Storage Solutions ArchitectNVIDIA is a leader in the field of artificial intelligence and high-performance computing, and we are seeking a highly skilled Senior AI-HPC Storage Solutions Architect to join our team.About the Role:We are looking for an expert in designing and implementing high-performance storage solutions for our AI...

  • Senior Manager

    4 weeks ago


    Santa Clara, California, United States Nvidia Full time

    Job SummaryNVIDIA is seeking a highly experienced Senior Manager to lead our Storage Systems team. As a key member of our Site Reliability Engineering (SRE) organization, you will be responsible for designing, implementing, and maintaining scalable and reliable storage systems to support our cloud infrastructure.Key ResponsibilitiesLead a team of Storage SRE...

  • Firmware Engineer

    4 weeks ago


    Santa Clara, California, United States Pure Storage Full time

    Join Pure Storage's Team of InnovatorsPure Storage is a leader in the data storage industry, and we're looking for a talented Firmware Engineer to join our team. As a Firmware Engineer, you will be responsible for designing, implementing, and testing firmware for our DirectFlash SSD Modules.Key Responsibilities:Design and develop firmware for DirectFlash SSD...


  • Santa Clara, California, United States NVIDIA Full time

    About NVIDIANVIDIA has been a pioneer in computer graphics, PC gaming, and accelerated computing for over two decades. Our legacy of innovation is driven by exceptional people and phenomenal technology. We're now harnessing the power of AI to define the next era of computing.The DGX AI Cloud TeamWe're crafting the future of AI and cloud technologies,...


  • Santa Clara, California, United States Pure Storage Full time

    Product Management Role at Pure StoragePure Storage is seeking a skilled Product Manager to lead the development of our enterprise storage solutions. As a Product Manager, you will be responsible for defining product and feature requirements, working closely with the engineering team, and managing product launch cycles.Key Responsibilities:Define product and...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is a leader in the field of computer graphics, PC gaming, and accelerated computing. We're now pushing the boundaries of AI to define the next era of computing.As a Senior Product Manager, you'll be part of the DGX AI Cloud team, which is crafting the future of AI and cloud technologies. Your role will be to develop a clear vision and strategy for our...


  • Santa Clara, California, United States Pure Storage Full time

    Pure Storage Security Risk Management SpecialistAt Pure Storage, we're redefining the traditional approach to risk management, and we're looking for a seasoned Security Risk Analyst to join our growing team. As a key member of our Global Information Security Office (GISO), you'll play a critical role in driving maturity in security processes through policies...


  • Santa Clara, California, United States Pure Storage Full time

    Join Pure Storage's Team as a Senior Product Marketing ManagerPure Storage is seeking a highly skilled Senior Product Marketing Manager to lead strategic product launches from conception through release and campaign delivery. As a key member of our team, you will prioritize the launch portfolio, drive alignment with executive leadership, and balance the...


  • Santa Clara, California, United States Pure Storage Full time

    Transform the Future of StoragePure Storage is revolutionizing the way businesses consume and interact with data. As a Senior Product Manager for our Digital Experience team, you will play a critical role in shaping the future of storage as a service.Key Responsibilities:Develop and evolve a comprehensive product strategy, aligning with business goals and...


  • Santa Clara, California, United States Pure Storage Inc. Full time

    Join Our Team as a Senior Product ManagerPure Storage Inc. is seeking a highly skilled Senior Product Manager to lead the evolution of our Evergreen//One Storage-as-a-Service business. As a key member of our Product Management team, you will be responsible for driving adoption, growth, and efficiency, as well as developing new offerings and motions.Key...

  • Senior Data Engineer

    4 weeks ago


    Santa Clara, California, United States The Fountain Group Full time

    Job SummaryWe are seeking a highly skilled Senior Data Engineer to join our team. As a key member of our data ingestion and processing platform team, you will be responsible for designing, developing, and implementing scalable data processing and storage solutions.Key ResponsibilitiesDesign and implement data ingestion pipelines from multiple heterogeneous...