Senior AI and ML Infra Engineer, Research Clusters

3 days ago


Austin, Texas, United States NVIDIA Full time

At NVIDIA in Santa Clara, CA, USA, we are currently seeking a skilled AI/ML Infrastructure Engineer to join our team. As an Engineer, you will have a unique chance to enhance productivity for our researchers by implementing improvements throughout the entire stack. Your main responsibility will be to identify and address infrastructure gaps to ensure reliable, efficient, and scalable solutions. Join us and be a part of shaping the future of AI/ML technology

In this role, you will have the chance to

Contribute to advanced AI/ML infrastructure solutions that have a direct impact on the efficiency of our highly skilled research teams.A dynamic and collaborative environment that values innovation, creativity, and continuous improvement.Competitive compensation and comprehensive benefits package.Opportunities for professional growth and career advancement within the AI/ML infrastructure domain.
What you will be doing:

Work closely with our research teams to comprehend their infrastructure requirements and challenges, translating those observations into actionable enhancements.Design and implement solutions for critical areas such as storage management for datasets and logs, error attribution, and core reliability issues within our large scale GPU clusters.Continuously monitor and optimize the performance of our AI/ML infrastructure, ensuring high availability, scalability, and efficient resource utilization.Create and deploy automation tools, monitoring solutions, and effective operational strategies to simplify infrastructure management and minimize manual tasks.Help define and enhance important measures of AI researcher productivity, ensuring that our actions are in line with measurable results.Collaborate with diverse teams, including researchers, data engineers, and DevOps professionals, to create a seamless and integrated AI/ML infrastructure ecosystem.Keep abreast of the latest advancements in AI/ML infrastructure technologies, frameworks, and effective strategies, and promote their implementation within the company.
What we need to see:

BS or equivalent experience (MS preferred) in Computer Science or related with 12+yrs of relevant experienceStrong background in software engineering, with experience in building and maintaining large-scale distributed systems, preferably in the context of AI/ML infrastructure.Proficiency in programming languages such as Python, Go, or C++, as well as familiarity with cloud computing platforms (e.g., AWS, GCP, Azure).Hands-on experience with containerization technologies (e.g., Docker, Kubernetes), automation tools (e.g., Ansible, Terraform), and monitoring solutions (e.g., Prometheus, Grafana).Understanding of AI/ML workflows, including data processing, model training, and inference pipelines.Excellent problem-solving skills, with the ability to analyze complex systems, identify bottlenecks, and implement scalable solutions.Excellent communication and collaboration skills, with the ability to work effectively with diverse teams and individuals.Enthusiasm for continual learning and keeping abreast of emerging technologies and effective approaches in the AI/ML infrastructure field.
NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most experienced and versatile people in the world working for us and, due to unprecedented growth, our extraordinary engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.

The base salary range is 220,000 USD - 419,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

  • Austin, Texas, United States NVIDIA Full time

    NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is a "learning machine" that constantly evolves by adapting...


  • Austin, Texas, United States Optum Full time

    Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and they need to feel their best. Here, you will find a culture guided by diversity and inclusion, talented...


  • Austin, Texas, United States PayPal Full time

    At PayPal (NASDAQ:PYPL), we believe that every person has the right to participate fully in the global economy. Our mission is to revolutionize commerce globally to make moving money, selling and shopping, personalized and secure.Job Description Summary:We are seeking a highly experienced Sr. Director of Engineering with a robust background in AI & Machine...


  • Austin, Texas, United States Dell Technologies Full time

    **Senior Engineer Site Reliability** Dell Technologies customers rely on our products and services to drive progress. So, we take the service we provide extremely seriously. Service Delivery is all about making sure our technical solutions help clients fulfil their priorities, challenges and initiatives. As trusted advisors, we build in-depth knowledge of...


  • Austin, Texas, United States Optum Full time

    This position is fully remote. You may work from anywhere in the US.Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best....


  • Austin, Texas, United States SambaNova Systems Full time

    The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale.SambaNova Suite is the first full-stack, generative AI platform, from chip to model, optimized for...


  • Austin, Texas, United States SambaNova Systems Full time

    The era of pervasive AI has arrived. In this era, organizations will use generative AI to unlock hidden value in their data, accelerate processes, reduce costs, drive efficiency and innovation to fundamentally transform their businesses and operations at scale. SambaNova Suite is the first full-stack, generative AI platform, from chip to model, optimized for...


  • Austin, Texas, United States Meta Inc Full time

    Summary:Meta is seeking a Partner Engineer to join Metas Applied AI Partner Engineering team, a highly technical team that works with strategic partners, machine learning leaders across the industry and all major cloud service providers for building and launching new Generative AI product services and experience and taking Large Language Models (LLMs) from...

  • AI Strategist

    4 weeks ago


    Austin, Texas, United States KUNGFU Full time

    KUNGFU.AI is a management consulting and engineering firm focused exclusively on artificial intelligence. We empower CEOs and senior executives to leverage the full potential of AI so they remain competitive in a rapidly evolving world. Our expert team delivers AI strategy and bespoke production-grade solutions that allow clients to rapidly realize value. We...


  • Austin, Texas, United States Tenstorrent Inc. Full time

    Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high...


  • Austin, Texas, United States Tenstorrent Inc. Full time

    Tenstorrent is leading the industry on cutting-edge AI technology, revolutionizing performance expectations, ease of use, and cost efficiency. With AI redefining the computing paradigm, solutions must evolve to unify innovations in software models, compilers, platforms, networking, and semiconductors. Our diverse team of technologists have developed a high...


  • Austin, Texas, United States AECOM Full time

    Job Title: AI and Data Integration Engineer at AECOMCompany DescriptionWork with Us. Change the World.AECOM is a leading infrastructure consulting firm, partnering with clients worldwide to tackle complex challenges and leave legacies for future generations. Our global team of over 50,000 professionals is dedicated to delivering projects that make a positive...


  • Austin, Texas, United States webAI Full time

    Title: AI Platform Solutions Engineer Company: webAI Location: Grand Rapids, MI; Austin, TX; Remote Type: Full-Time, Salaried Exempt Experience: 5-10 years Education: Bachelor's Degree, minimum About Us: webAI is a software company that is building a decentralized AI development platform. Our technology enables the development of powerful AI using limited...


  • Austin, Texas, United States AMD Full time

    WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our...


  • Austin, Texas, United States Advanced Micro Devices , Inc. Full time

    Overview: WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences the building blocks for the data center, artificial intelligence, PCs, gaming and embedded....

  • Software Engineer

    1 month ago


    Austin, Texas, United States Apple Full time

    SummaryPosted: Apr 25, 2024Role Number: Do you love understanding every detail of how new technologies work? Join the team that serves as Apple's nerve center, our Information Systems and Technology group. There are countless ways you'll contribute here, whether you're coordinating technology needs for product launches, designing music solutions for retail...

  • Software Engineer

    4 weeks ago


    Austin, Texas, United States Apple Full time

    SummaryPosted: Apr 25, 2024Role Number: Do you love understanding every detail of how new technologies work? Join the team that serves as Apple's nerve center, our Information Systems and Technology group. There are countless ways you'll contribute here, whether you're coordinating technology needs for product launches, designing music solutions for retail...


  • Austin, Texas, United States PayPal Full time

    PayPal is looking for an individual contributor with strong ML engineering background in the Global Analytics and Data Science (GADS) Organization to design and develop a suite of machine learning solutions driving large-scale personalization of financial services, merchant products and action recommendations for millions of PayPal customers across the...


  • Austin, Texas, United States I-Con Technology Full time

    Now Hiring Senior AI Data Engineer I ICON is looking for an entrepreneurial Senior AI Data Engineer I to join our growing team. In this role, you will be responsible for designing, building, and maintaining the data infrastructure that powers our data-driven products and services. You will work with our AI engineers, data labelers and external data providers...


  • Austin, Texas, United States Skyways LTD Full time

    At Skyways we are building a new form of air transportation. Some people call it the flying car, except we have our own master plan on how to get there. We believe autonomous unmanned aerial vehicles represent a unique opportunity to move things and ultimately people in new, more efficient ways. Skyways is an early stage startup based near Austin TX. We are...