Senior DevOps and Automation Engineer

2 weeks ago


Santa Clara, United States NVIDIA Full time

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars.

We are the GPU Communications Libraries and Networking team at NVIDIA. We deliver libraries like NCCL and NVSHMEM for Deep Learning and HPC applications. We are looking for a motivated DevOps and Automation Engineer to help us increase our execution efficiency. Most DL and HPC applications run on large clusters with high-speed networking (Infiniband, RoCE). This is an outstanding opportunity going beyond the traditional DevOps roles and responsibilities. Are you ready for to contribute to the development of innovative technologies and help realize NVIDIA's vision?

What you will be doing:

As a Senior Software Engineer in the GPU Communications Group, you will utilize your knowledge and expertise in high availability network software to create, enhance, and maintain our GPU communication solutions. You will:

  • Maintain and improve CI/CD systems (Gitlab, Github, Perforce)
  • Develop tools and automation to deploy testing on new systems and platforms, including cloud platforms (Azure, AWS, GCP, etc.)
  • Maintain internal cluster servers and Infiniband/RoCE networks
  • Collect a lot of performance data; build tools and infrastructure to visualize and analyze the information
  • Collaborate with a very dynamic team across multiple time zones
What we need to see:
  • B.S. or M.S. in Computer Science, or related field and 5+ years of relevant experience
  • Excellent C/C++ programming and debugging skills
  • Expert in a scripting language, preferably Python
  • Proficient with Linux fundamentals
  • Familiar with containers, cloud provisioning and scheduling tools (Docker, Docker Swarm, Kubernetes, SLURM, Ansible)
  • Adaptability and passion to learn new areas and tools
  • Flexibility to work and communicate effectively across different teams and timezones
Ways to stand out from the crowd:
  • Experience conducting performance benchmarking and developing infrastructure on HPC clusters. Prior system administration experience, esp for large clusters
  • Good understanding of Infiniband/RoCE networks and experience debugging network configuration issues
  • Familiarity with CUDA programming and/or GPUs. Experience with Deep Learning Frameworks such PyTorch, TensorFlow. Deep understanding of technology and passionate about what you do
The base salary range is 144,000 USD - 270,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
  • Automation Engineer

    2 days ago


    Santa Clara, United States Ursus Inc Full time

    JOB TITLE: Automation Engineer - DevOps LOCATION: Santa Clara, CA - HYBRID DURATION: 8 months PAY RANGE: $64-$74/hr COMPANY: Our client is a leading provider of enterprise cloud applications for finance and human resources. Job Summary: Software Automation Engineers are in charge for designing and executing automated test scripts / programs and check...

  • DevOps Engineer

    5 days ago


    Santa Clara, United States SPECTRAFORCE Full time

    Title: DevOps EngineerLocation: Santa Clara, CA - Must be hybrid onsiteDuration: 8 Months - Potential to extend but not likely. Interview Process: 2 rounds via zoom. Job Summary:Software Automation Engineers are in charge for designing and executing automated test scripts / programs and check framework enhancements, presenting process improvement...

  • DevOps Engineer

    5 days ago


    Santa Clara, United States SPECTRAFORCE Full time

    Title: DevOps EngineerLocation: Santa Clara, CA - Must be hybrid onsiteDuration: 8 Months - Potential to extend but not likely. Interview Process: 2 rounds via zoom. Job Summary:Software Automation Engineers are in charge for designing and executing automated test scripts / programs and check framework enhancements, presenting process improvement...

  • DevOps Engineer

    5 days ago


    Santa Clara, United States SPECTRAFORCE Full time

    Title: DevOps EngineerLocation: Santa Clara, CA - Must be hybrid onsiteDuration: 8 Months - Potential to extend but not likely. Interview Process: 2 rounds via zoom. Job Summary:Software Automation Engineers are in charge for designing and executing automated test scripts / programs and check framework enhancements, presenting process improvement...


  • Santa Clara, United States DSP Concepts Full time

    Job DescriptionJob DescriptionDSP Concepts is the Silicon Valley based leader in the Audio of Things (AoT) market and the creator of Audio Weaver, the Audio Experience platform that makes audio innovation easy. DSP Concepts equips and supports engineers with real-time workflows to quickly stand up prototypes, collaborate and modify designs across teams, and...


  • Santa Clara, United States Sigmaways Inc Full time

    Job DescriptionJob DescriptionThe DevOps Engineer is a key member in engineering and DevOps groups and will apply knowledge of design principles, practices in the implementation of complex,enterprise-scale software systems. General responsibilities include design concept generation, participating in and leading design reviews for components or features,...


  • Santa Clara, United States Professional Recruiters Full time

    Principal Software Engineer, Santa Clara, California or Tempe, Arizona Come join a growing bank at the heart of the innovation, technology, green tech and life sciences space. We continue to expand our global footprint and our banking technology is at the core of everything we do. Work within our DevOps team and be part of a group that helps ensure our...


  • Santa Clara, United States ALOIS LLC Full time

    Job DescriptionJob DescriptionJob Title : Software Engineering - Engineer, Senior Staff|6246 Location : San Diego, CADuration : 12+ Month (Possibility of extension) Job Description: Top 6 requirements:1. Continuous Integration Development2. Automated Test Development3. Modern CI/CD pipeline development with GitLab CI (Continuous Integration), GitHub actions...


  • Santa Clara, United States Professional Recruiters Full time

    Principal Software Engineer, Santa Clara, California or Tempe, Arizona Come join a growing bank at the heart of the innovation, technology, green tech and life sciences space. We continue to expand our global footprint and our banking technology is at the core of everything we do. Work within our DevOps team and be part of a group that helps ensure our...


  • Santa Clara, United States NVIDIA Full time

    NVIDIA is using the power of high performance computing and AI to accelerate digital biology. We are seeking passionate and hardworking individuals to help us realize our mission. We are seeking an experienced and highly skilled DevOps and Release Engineer to join our team. As a DevOps and Release Engineer, you will play a critical role in ensuring the...


  • Santa Clara, United States NVIDIA Full time

    We are now looking for a Senior Python Automation Engineer, for our Deep Learning Algorithms team! ​Join the team building software which will be used by the entire world of AI. Work with high class software engineers to implement a large scale toolset that tests deep learning models and frameworks on the most powerful computers. Ability to work in a...


  • Santa Clara, United States Halo Industries Full time

    As a Senior Systems Engineer at Halo Industries, you will play a crucial role in the development and integration of our groundbreaking semiconductor manufacturing technology. Leveraging your expertise in system design, integration, and automation, particularly within the semiconductor industry or related fields, you will contribute to the evolution of our...


  • Santa Clara, United States Wipro Full time

    About Wipro:Wipro Limited (NYSE: WIT, BSE: 507685, NSE: WIPRO) is a leading technology services and consulting company focused on building innovative solutions that address clients’ most complex digital transformation needs. We leverage our holistic portfolio of capabilities in consulting, design, engineering, operations, and emerging technologies to help...


  • Santa Clara, United States Wipro Full time

    About Wipro:Wipro Limited (NYSE: WIT, BSE: 507685, NSE: WIPRO) is a leading technology services and consulting company focused on building innovative solutions that address clients’ most complex digital transformation needs. We leverage our holistic portfolio of capabilities in consulting, design, engineering, operations, and emerging technologies to help...


  • Santa Clara, United States Wipro Full time

    About Wipro:Wipro Limited (NYSE: WIT, BSE: 507685, NSE: WIPRO) is a leading technology services and consulting company focused on building innovative solutions that address clients’ most complex digital transformation needs. We leverage our holistic portfolio of capabilities in consulting, design, engineering, operations, and emerging technologies to help...


  • Santa Clara, United States Halo Industries Full time

    As a Senior Systems Engineer at Halo Industries, you will play a crucial role in the development and integration of our groundbreaking semiconductor manufacturing technology. Leveraging your expertise in system design, integration, and automation, particularly within the semiconductor industry or related fields, you will contribute to the evolution of our...


  • Santa Clara, United States Halo Industries, Inc. Full time

    Job DescriptionJob DescriptionAs a Senior Systems Engineer at Halo Industries, you will play a crucial role in the development and integration of our groundbreaking semiconductor manufacturing technology. Leveraging your expertise in system design, integration, and automation, particularly within the semiconductor industry or related fields, you will...


  • Santa Clara, United States SPECTRAFORCE Full time

    Position: DevOps CI/CD Verification EngineerLocation: Remote (Bay Area, CA)Duration: 12 monthsJob Description:We are seeking a highly experienced DevOps CI/CD Verification Engineer to drive hardware/software verification, emulation automation, SOC (System on Chip) automation. This pivotal role involves designing, building, and deploying robust...


  • Santa Clara, United States SPECTRAFORCE Full time

    Position: DevOps CI/CD Verification EngineerLocation: Remote (Bay Area, CA)Duration: 12 monthsJob Description:We are seeking a highly experienced DevOps CI/CD Verification Engineer to drive hardware/software verification, emulation automation, SOC (System on Chip) automation. This pivotal role involves designing, building, and deploying robust...

  • DevOps Engineer

    2 days ago


    Santa Clara, United States Tech Mahindra Full time

    Role : DevOps EngineerLocation : Santa Clara, CA (mandatory to be in office at least 4 days/week – no exception)JD Summary :Skilled engineer to join our team and contribute to the successful implementation and management of our cloud-based infrastructure. Stay up to date with the latest trends and advancements in Python, APIs, web server administration,...