Principal Engineer, Distributed Machine Learning Solutions

4 weeks ago


Santa Clara, California, United States NVIDIA Full time

NVIDIA is seeking a Principal Engineer to lead the development of GPU-accelerated distributed machine learning solutions. The ideal candidate will have a strong background in software development and experience with distributed machine learning frameworks such as Apache Spark.

The successful candidate will design and develop new user-friendly APIs and libraries to optimize the use of existing deep learning and machine learning frameworks in GPU-enabled Spark clusters for distributed training and inference at scale.

Key responsibilities will include:

  • Designing and developing GPU-accelerated machine learning libraries for distributed training and inference on Spark clusters
  • Demonstrating superior performance of developed solutions on industry-standard benchmarks and datasets
  • Making technical contributions to enhance the capabilities of open-source projects such as RAPIDS, XGBoost, and Apache Spark
  • Working with NVIDIA partners and customers on deploying distributed machine learning algorithms in cloud or on-premise environments
  • Staying up-to-date with published advances in distributed machine learning systems and algorithms
  • Providing technical mentorship to a team of engineers

Requirements include:

  • BS, MS, or PhD in Computer Science, Computer Engineering, or a closely related field
  • 12+ years of work or research experience in software development
  • 5+ years of experience as a technical lead in distributed machine learning and/or deep learning
  • 3+ years of open-source development experience
  • 3+ years of hands-on experience with Spark MLlib, XGBoost, and/or PyTorch
  • Knowledge of the internals of Apache Spark MLlib
  • Experience with Kubernetes, YARN, Spark, and/or Ray for distributed ML orchestration
  • Proven technical skills in designing, implementing, and delivering high-quality distributed systems
  • Excellent programming skills in C++, Scala, and Python
  • Familiarity with agile software development practices

NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer.



  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionAt Palo Alto Networks, we're seeking a highly skilled Principal Machine Learning Engineer to join our team. As a key member of our cybersecurity team, you will be responsible for designing and developing advanced machine learning solutions to protect our customers' digital way of life.Our mission is to leverage AI and machine learning...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Principal Machine Learning Engineer to join our team at Palo Alto Networks. As a key member of our cybersecurity team, you will be responsible for designing and developing advanced AI and machine learning solutions to protect our customers' digital way of life.Your ResponsibilitiesDesign and develop workflow...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is seeking a highly skilled Senior Distributed Machine Learning Engineer to join our team focused on GPU accelerated Apache Spark.Data scientists often apply machine learning (ML) and deep learning (DL) algorithms over large datasets to train AI models. To accelerate and scale the model training, some libraries (e.g., XGBoost, RAPIDS cuML, PyTorch,...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Protecting Our Digital FuturePalo Alto Networks is the fastest-growing security company in history, and we're looking for a talented Machine Learning Software Engineer to join our team. As a key member of our Data Science team, you'll use data science and machine learning to solve complex problems and develop innovative solutions to address cyber security...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Principal Machine Learning Engineer to join our team. As a key member of our cybersecurity team, you will be responsible for designing and developing advanced machine learning solutions to protect our customers' networks.Key Responsibilities:Design and develop machine learning models to detect and...


  • Santa Clara, California, United States Amazon Full time

    About the RoleWe are seeking a skilled Machine Learning Engineer to join our team at Amazon. As a Machine Learning Engineer, you will be responsible for designing, developing, and deploying machine learning models to solve complex business problems. You will work closely with data scientists, software engineers, and other stakeholders to identify...


  • Santa Clara, California, United States Amazon Full time

    About the RoleWe are seeking a highly skilled Machine Learning Engineer to join our team at Amazon. As a Machine Learning Engineer, you will be responsible for designing and developing cloud-based AI solutions that meet the needs of our customers.Key ResponsibilitiesDesign and develop cloud-based AI solutions using machine learning algorithms and...


  • Santa Clara, California, United States XPENG Motors Full time

    We are seeking a highly skilled Staff Machine Learning Engineer - AI Foundation to join our team at XPeng Motors.The ideal candidate will have a deep understanding of large-scale deep learning models and experience with PyTorch.Responsibilities include designing, training, and deploying large deep learning models that can leverage vast amounts of labeled and...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionYour CareerPalo Alto Networks is looking for a talented Data Scientist with expertise in Machine Learning to tackle complex security problems and develop innovative solutions.Your ImpactApply Machine Learning techniques to security challenges in a production environmentDevelop creative solutions to address hard problemsWork with distributed...


  • Santa Clara, California, United States Eightfold LLC Full time

    About Eightfold Eightfold AI is the industry leader in AI-powered talent intelligence and transforming the way organizations manage their talent. Our AI-powered Talent Intelligence Platform helps companies identify, attract, and retain top talent, while also providing employees with the tools they need to grow and succeed in their careers. About the AI/ML...


  • Santa Clara, California, United States XPENG Motors Full time

    Job Title: Machine Learning Engineer - AI FoundationXpeng Motors is a leading smart electric vehicle company that designs, develops, manufactures, and markets smart EVs with advanced Internet, AI, and autonomous driving technologies. We are committed to in-house R&D and intelligent manufacturing to create a better mobility experience for our customers.We are...


  • Santa Clara, California, United States SoundHound Full time

    About the Role:We are seeking a highly skilled Data Scientist and Machine Learning Engineer to join our team at SoundHound. As a key member of our engineering team, you will be responsible for developing and implementing advanced machine learning models to drive business growth and improve user experience.With a huge amount of data from hundreds of millions...


  • Santa Clara, California, United States Eightfold LLC Full time

    About the RoleThe AI/ML Engineer will be part of our cutting-edge team building industry-leading Machine Learning models for use cases at scale. As a key contributor, you will develop next-generation technologies that disrupt the $400 Billion HR Tech industry. Your work will involve pushing the boundaries of applied Machine Learning on challenging and huge...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionYour CareerAs a member of the Internet Security Research Team, you will work closely with data scientists, security researchers, and other engineers on implementing different projects to detect and defend against various emerging threats in the areas of Web Security. You will build machine learning models and develop big data and distributed...


  • Santa Clara, California, United States Precisionneuro Full time

    About Precision NeurosciencePrecision Neuroscience is a pioneering company that is revolutionizing the field of brain-computer interfaces (BCIs). Our mission is to develop cutting-edge technologies that enable people with neurological conditions to regain independence and communicate with loved ones.We are seeking an experienced Director of Machine Learning...


  • Santa Clara, California, United States Amazon Development Center U.S., Inc. Full time

    About the RoleWe are seeking a highly skilled Senior Machine Learning Engineer to join our team at Amazon Development Center U.S., Inc. The ideal candidate will have a strong background in machine learning, large language models, and multimodal models, with experience in programming languages such as Java, C++, and Python.As a Senior Machine Learning...


  • Santa Clara, California, United States Eightfold LLC Full time

    About Eightfold AI,we are the industry leader in AI-powered talent intelligence and transforming the way organizations manage their talent. Our AI-powered Talent Intelligence PlatformTM helps companies identify, attract, and retain top talent, while also providing employees with the tools they need to grow and succeed in their careers.The AI/Machine Learning...


  • Santa Clara, California, United States Amazon Full time

    Amazon - An Industry Leader in E-commerce and Technology We are seeking talented Senior Machine Learning Engineers to join our team at Amazon, working on various AI projects that enhance the shopping experience for our customers. As a Senior Machine Learning Engineer, you will be responsible for designing, developing, and deploying large-scale machine...


  • Santa Clara, California, United States NVIDIA Full time

    Job DescriptionNVIDIA is seeking a highly skilled Principal Engineer to design and build a software factory that will take an AI model and create deployable services across Cloud and On-prem Kubernetes environments.The ideal candidate will have advanced programming skills to build distributed and compute systems, backend services, microservices, and cloud...


  • Santa Clara, California, United States Apple Full time

    Role SummaryAt Apple, we're committed to creating innovative products and services that make a difference in people's lives. As a Staff Engineer on our ML Compute Team, you'll play a critical role in designing and delivering cutting-edge machine learning infrastructure that powers our products and services.Your Key Responsibilities Collaborate with...