Senior AI Infrastructure Services Software Engineer

4 weeks ago


Santa Clara CA, United States Nvidia Full time

We are seeking a highly skilled Senior Infrastructure System Software Engineer with Kubernetes-based infrastructure experience to join our Omniverse Infrastructure team The ideal candidate will have a solid understanding of system software design principles and experience in deploying, managing, optimizing, and scaling NVIDIA Omniverse Cloud, a platform-as-a-service (PaaS) that provides developers and enterprises a full-stack cloud environment to design, develop, deploy, and manage industrial Omniverse applications and workflows.The candidate will work closely with cross-functional teams to design and develop common system software blocks within Kubernetes clusters (e.g., Custom Resource Definitions, Operators and system plug-ins) to meet the highly challenging and multi-faceted requirements of the NVIDIA Omniverse Cloud. They include but are not limited to elasticity, multitenancy, high availability, fault tolerance, debuggability, operational efficiency, and sustainability of the cluster-level services as needed to onboard and optimize omniverse applications and workflows at large scale. A key feature of the workflows to compose one or more high-performance simulation/AI tasks, streaming Kit-based applications of various types, and elastic microservices via the use of Cloud APIs.What you will be doing:Design and develop low-level system software solutions within Kubernetes to manage and schedule OVX cluster resources in order to power NVIDIA Omniverse Cloud (OVC).Design and develop cluster-level system software solutions to map a wide range of Omniverse workloads to the high-performance interactive tasks (Kit-based applications), elastic microservices and simulation/AI tasks.Collaborate with multiple Omniverse product teams to understand customer storage, compute requirements, and build supporting infrastructure.Work across organizational boundaries with diverse hardware and software engineers.Proactively identify and address system software challenges in compute, networking, and storage resource utilization that affect OVC’s availability, multi-tenancy, fault tolerance, debuggability, operational efficiency, and sustainability.What we need to see:6+ years of hands-on system software engineering experience to extend the cluster-level services for large-scale Kubernetes4+ years of experience building large-scale distributed, fault-tolerant distributed servicesExperience with cloud infrastructure platforms like AWS, Azure, and Google CloudStrong systems programming skills, including optimizations using multi-threading, asynchronous programming, concurrency and parallelism, caching, and batchingProficiency in Python, C/C++ and GolangWorking knowledge of elasticity techniques within KubernetesDeep understanding of cloud technologies, distributed compute systems, and distributed systems and microservices architectureMasters or PhD in Computer Science or a related field (or equivalent experience)Excellent interpersonal skills and ability to work successfully with multi-functional teams, principles, and architects across organizational boundaries and geographiesWays to stand out from the crowd:Expert knowledge of virtualization and containerization technologies like Docker, VMware, KVM, etcStrong knowledge of elasticity techniques within KubernetesExperience of co-designing high-performance application workflows with the underlying cluster-level software such as Slurm and/or KubernetesThe base salary range is 180,000 USD - 339,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.SummaryLocation: US, CA, Santa Clara; US, WA, Remote; US, FL, Remote; US, CA, Remote; US, WA, RedmondType: Full time



  • Santa Clara, United States NVIDIA Full time

    Join NVIDIA IT, where we are on a mission to building and delivering world-class platforms to optimize NVIDIA's IT infrastructure operations, encompassing IT asset management, configuration management, monitoring, logging, and incident management. Our platform ecosystem combines open-source tools, vendor products, and in-house innovations, with our ultimate...


  • Santa Clara, CA, United States NVIDIA Corporation Full time

    Senior AI-HPC Storage Engineer page is loaded Senior AI-HPC Storage Engineer Apply locations US, CA, Santa Clara US, MA, Westford US, TX, Austin time type Full time posted on Posted 22 Days Ago job requisition id JR1977545 NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming...


  • Santa Clara, CA, United States NVIDIA Full time

    Senior Software Engineer in Infrastructure Development We are looking to hire a Senior Software Engineer who will work in the Test Solutions Group at NVIDIA developing manufacturing test solutions for Datacenter and Enterprise products. In this role, you will collaborate with other key hardware design teams help enable NVIDIA's next generation GPU...


  • Santa Clara, CA, United States NVIDIA Corporation Full time

    Senior Software Engineer - HPC page is loaded Senior Software Engineer - HPC Apply locations US, CA, Santa Clara US, MA, Westford US, TX, Austin US, NC, Durham time type Full time posted on Posted 2 Days Ago job requisition id JR1979406 NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 fueled the growth of the...


  • Santa Clara, CA, United States NVIDIA Full time

    We are now looking for a highly motivated Senior Software Engineer for AI Streaming Software! Academic and commercial groups around the world are powering a revolution in artificial intelligence using deep learning techniques running on NVIDIA GPUs. Intelligent machines powered by AI computers that can learn, reason, and interact with people are no longer...


  • Santa Clara, CA, United States Nvidia Full time

    We are seeking a Vice President, Product Management, AI Cloud Infrastructure for DGX Cloud. If you are a customer-focused, technically adept leader driven by a passion for fostering innovation in AI and cloud computing and thrive in a multifaceted, collaborative environment, join our team in shaping the future of AI and cloud technologies, delivering...


  • Santa Clara, CA, United States NVIDIA Corporation Full time

    NVIDIA is looking for Senior Software Engineering to join NVIDIA in the Cumulus Linux team! We present you with an opportunity to be part of the team that develops the Network Operating System that powers data centers that are accelerated, disaggregated and software-defined to meet the exploding growth in AI and high-performance computing. You'll be part of...


  • San Francisco, CA, United States Spice AI Full time

    Building data and AI-driven software is still way too hard, even for advanced developers. At Spice AI, we’re helping developers combine code with data and machine learning (ML) to create truly intelligent applications. Spice AI is on a mission to make this as easy as creating a modern web page. Spice.ai provides building blocks for data and AI-driven...


  • Santa Clara, California, United States NVIDIA Full time

    Hardware Infrastructure is seeking a Program Manager to lead programs and initiatives creating infrastructure to enable NVIDIAs most advance AI and hardware researchers and engineers to create the future of computing. This leader will guide engineering programs in best of industry processes that enable a fast pace and rapidly growing roadmap. They will...


  • San Jose, CA, United States Cisco Full time

    Who We Are The Cisco Security AI team delivers AI products and platform for all Cisco secure products and portfolios so businesses around the world defend against threats and safeguard the most vital aspects of your business with security resilience. We are passionate about making businesses secure and simplify security with zero compromise using AI and...


  • Santa Clara, United States NVIDIA Full time

    Senior Solution Engineer - AI and ML Storage Architecture page is loaded Senior Solution Engineer - AI and ML Storage Architecture Apply locations US, CA, Santa Clara US, Remote time type Full time posted on Posted Yesterday job requisition id JR1980968 As a Senior Solution Engineer specializing in AI/ML Storage Architecture, you will be an integral part of...


  • Santa Clara, United States NVIDIA Full time

    Senior Solution Engineer - AI and ML Storage Architecture page is loaded Senior Solution Engineer - AI and ML Storage Architecture Apply locations US, CA, Santa Clara US, Remote time type Full time posted on Posted Yesterday job requisition id JR1980968 As a Senior Solution Engineer specializing in AI/ML Storage Architecture, you will be an integral part of...


  • Santa Clara, CA, United States NVIDIA Corporation Full time

    Senior Software Engineer, GPU Communications and Networking page is loaded Senior Software Engineer, GPU Communications and Networking Apply locations US, CA, Santa Clara time type Full time posted on Posted 2 Days Ago job requisition id JR1972306 NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance...


  • Santa Clara, United States d-Matrix Full time

    d-Matrix has fundamentally changed the physics of memory-compute integration with our digital in-memory compute (DIMC) engine. The “holy grail” of AI compute has been to break through the memory wall to minimize data movements. We’ve achieved this with a first-of-its-kind DIMC engine. Having secured over $154M, $110M in our Series B offering, d-Matrix...


  • Santa Clara, CA, United States NVIDIA Corporation Full time

    Senior Full-Stack Software Engineer page is loaded Senior Full-Stack Software Engineer Apply locations US, CA, Santa Clara time type Full time posted on Posted 2 Days Ago job requisition id JR1982319 Widely considered to be one of the technology world’s most desirable employers, NVIDIA is an industry leader with groundbreaking developments in...


  • Santa Clara, CA, United States NVIDIA Full time

    NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a “learning machine” that constantly evolves...


  • Santa Clara, United States NVIDIA Full time

    Senior Platform Software Engineer, AI Server - GPU page is loaded Senior Platform Software Engineer, AI Server - GPU Apply locations US, CA, Santa Clara US, Remote time type Full time posted on Posted 4 Days Ago job requisition id JR1980965 NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics,...


  • San Francisco, CA, United States FATHOM Full time

    Fathom is on a mission to use AI to understand and structure the world's medical data, starting by making sense of the terabytes of clinician notes contained within the electronic health records of the world's largest health systems. Our deep learning engine automates the translation of patient records into the billing codes used for healthcare...


  • Santa Clara, United States NVIDIA Full time

    NVIDIA platforms are at the center of generative AI, autonomous driving, industrial robots, medical instruments and data centers across the world where GPU accelerated AI is revolutionizing the technology industry. As a platform company we deliver not just hardware solutions but also vertically integrated software stacks, GPU accelerated SDKs, libraries and...


  • Santa Clara, CA, United States NVIDIA Full time

    The Automotive Vehicles Test Engineering team is searching for a creative and experienced Software Engineer to help us bring NVIDIA's autonomous vehicle solution out to the world. You will work with hardworking and dedicated multi-functional engineering development teams across various vehicle subsystems to integrate their work into our AV SW platform, while...