Senior Cloud Infrastructure Engineer

6 days ago


Santa Clara, California, United States NVIDIA Full time

We are seeking a highly skilled Senior Systems Engineer to work on scaling our cloud compute platform for Autonomous Vehicles (AV). Our platform provides access to 100s of PBs of data and exa-scale GPU+CPU compute for various AV workloads including data ingestion, processing and model training.

We are embarking on building the next generation of the platform and looking for strong engineers to join us in this journey.

Key Responsibilities:

  • Enhance and scale our compute platform to support diverse workloads on GPUs and CPUs
  • Design and build scalable and distributed services to power large scale workloads
  • Design and build scalable tools to efficiently operate services and hardware clusters
  • Collaborate with multiple teams to understand their needs, and build functionality that improves their user experience and productivity
  • Participate in operations, oncall and user support

Requirements:

  • BS/MS/PhD in Computer Science, Engineering or other technical fields or equivalent experience
  • 6+ years of experience developing and operating backend systems at scale
  • Proficiency in Golang and distributed systems
  • Deep care for user experience
  • Strong collaboration and communication skills
  • Extremely motivated, highly passionate, curious about and follow state-of-the-art technologies
  • Strong willingness to learn, listen to diverse opinions, and contribute to an inclusive and growth-oriented culture

Preferred Qualifications:

  • Prior background in building AI Infrastructure for Autonomous Vehicles
  • Familiarity with HPC and workload managers (e.g. SLURM)
  • Experience with Workflow orchestration systems (e.g Flyte, Kubeflow pipelines, Airflow)
  • Experience managing and deploying services on the cloud (e.g. AWS, GCP)
  • Open source contributions

The base salary range is 180,000 USD - 339,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits.

NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.



  • Santa Clara, California, United States Palo Alto Networks Full time

    Job Title: Senior Cloud Infrastructure EngineerPalo Alto Networks is seeking a highly skilled Senior Cloud Infrastructure Engineer to join our team. As a Senior Cloud Infrastructure Engineer, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.Key Responsibilities:Design and implement scalable cloud...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Senior Cloud Infrastructure Engineer to join our team at Palo Alto Networks. As a key member of our Cloud Infrastructure team, you will be responsible for designing, building, and operating scalable and secure cloud infrastructure to support our mission-critical applications.Key ResponsibilitiesDesign and...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Site Reliability EngineerNVIDIA is seeking a highly skilled Senior Site Reliability Engineer to join our Infrastructure, Planning and Process (IPP) team. As a key member of our global organization, you will play a critical role in designing and implementing scalable, reliable, and efficient cloud infrastructure solutions.Our cloud services...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking a highly skilled Senior Cloud Infrastructure Engineer to join our CDL/SLS team. As a key member of our team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using Terraform,...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team at Palo Alto Networks. As a key member of our team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.Key ResponsibilitiesContribute to the success of SRE and DevOps teamsDevelop expertise in new...


  • Santa Clara, California, United States Oracle Full time

    Job Title: Senior Network Engineer in Cloud InfrastructureAt Oracle, we're building the future of cloud computing for enterprises. As a Senior Network Engineer in Cloud Infrastructure, you'll be part of a diverse team of creators and inventors who drive innovation and excellence.Responsibilities:Design, deploy, and operate a large-scale global Oracle cloud...


  • Santa Clara, California, United States Oracle Full time

    Job Title: Senior Cloud Infrastructure DeveloperWe are seeking a highly skilled Senior Cloud Infrastructure Developer to join our Oracle Cloud Infrastructure (OCI) Platform Integration (PINT) team. As a key member of our team, you will be responsible for designing, implementing, and maintaining cloud infrastructure solutions that meet the needs of our...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Cloud Infrastructure Engineer to join our team at Palo Alto Networks. As a key member of our Cloud Infrastructure team, you will be responsible for designing, building, and operating our cloud infrastructure to ensure high availability, scalability, and security.Key ResponsibilitiesDesign and implement cloud...


  • Santa Clara, California, United States Nvidia Full time

    Job Title: Senior Site Reliability EngineerWe are seeking a highly motivated and experienced Senior Site Reliability Engineer to join our Embedded organization. This team is responsible for automating, deploying, and maintaining infrastructure for various NVIDIA AI workflows and applications such as Metropolis, ACE, and Riva hosted in the cloud.Key...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team at Palo Alto Networks. As a key member of our team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key ResponsibilitiesContribute to the success of SRE and DevOps teamsDevelop expertise in new...


  • Santa Clara, California, United States NVIDIA Full time

    We are seeking a highly motivated Senior Cloud Infrastructure Engineer to join our Embedded organization.This team is responsible for automating, deploying, and maintaining infrastructure for various NVIDIA AI workflows and applications such as Metropolis, ACE, and Riva hosted in the cloud.The ideal candidate will focus on ensuring production health to...


  • Santa Clara, California, United States NVIDIA Full time

    Job DescriptionNVIDIA is seeking a Senior Site Reliability Engineer to join our AI Efficiency Team. As a key member of this team, you will contribute to the development of infrastructure that powers our innovative AI research.The AI Efficiency Team focuses on optimizing efficiency and resiliency of AI workloads, as well as developing scalable AI and Data...


  • Santa Clara, California, United States NVIDIA Full time

    Join NVIDIA's AI Efficiency TeamWe are seeking a Senior Site Reliability Engineer to contribute to the infrastructure that powers our innovative AI research.About the RoleThis team focuses on optimizing efficiency and resiliency of AI workloads, as well as developing scalable AI and Data infrastructure tools and services.Our objective is to deliver a stable,...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Our MissionPalo Alto Networks is committed to protecting the digital way of life by providing innovative cybersecurity solutions. We believe in the power of collaboration and value in-person interactions, fostering a culture of innovation and creativity.Job DescriptionWe are seeking a highly skilled Senior Staff DevOps Engineer to join our CDL/SLS team. As a...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our Cortex Data Lake team. As a key member of our team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key ResponsibilitiesContribute to the success of our SRE and DevOps teams by developing...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About UsPalo Alto Networks is a leading cybersecurity company that protects the digital way of life. Our mission is to be the cybersecurity partner of choice, and we're committed to providing innovative solutions to prevent cyberattacks.Job DescriptionWe're seeking a highly skilled Senior Staff DevOps Engineer to join our CDL/SLS team. As a key member of our...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team. As a key member of our infrastructure platform team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Our infrastructure platform stack includes Terraform, Kubernetes, GitLab...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Senior Staff DevOps Engineer to join our CDL/SLS team. As a key member of our team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Our infrastructure platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps, Prometheus, Grafana,...


  • Santa Clara, California, United States Astera Labs Full time

    Astera Labs: Transforming Data-Driven ApplicationsAstera Labs is a global leader in purpose-built connectivity solutions that unlock the full potential of AI and cloud infrastructure.Our Intelligent Connectivity Platform integrates PCIe, CXL, and Ethernet semiconductor-based solutions and the COSMOS software suite of system management and optimization tools...


  • Santa Clara, California, United States Sustainable Talent Full time

    Job OverviewSustainable Talent is seeking a highly skilled Senior Infrastructure Engineer to support the NVIDIA Cloud Infrastructure Team. As a key member of our team, you will be responsible for supporting infrastructure team operations, cloud infrastructure system enrollments, deployments, and troubleshooting.Key Responsibilities:Support Infrastructure...