Senior MLOps Engineer, GenAI Framework
4 weeks ago
NVIDIA is seeking a senior build and continuous integration (CI/CD) engineer for its GenAI Frameworks (NeMo, Megatron Core) team.
NVIDIA NeMo is an open-source, scalable, and cloud-native framework built for researchers and developers working on Large Language Models (LLM), Multimodal (MM), and Speech AI.
NeMo provides end-to-end model training, including data curation, alignment, customization, evaluation, deployment, and tooling to optimize performance and user experience.
Building upon modern DevOps tools, your work will enable GenAI framework software engineers and deep learning algorithm engineers to work efficiently with a wide variety of deep learning algorithms and software stacks as they seek out opportunities for performance optimization and continuously deliver high-quality software.
Key Responsibilities:
- Architect and lead the build-release continuous integration processes of our Generative AI framework and libraries related to NeMo framework and Megatron Core.
- Propose, implement, and deploy efficient and scalable DevOps solutions to allow our fast-growing team to release software more frequently while maintaining high-quality and top performance.
- Work with industry-standard tools (Kubernetes, Docker, Slurm, Ansible, GitLab, GitHub Actions, Jenkins, Artifactory, Jira).
- Assist with cluster operations and system administration (managing servers, team accounts, clusters).
- Automate away recurring tasks (DL algorithm accuracy and performance regression detection, designing and developing new quality control measures, e.g., code analysis) while employing and advancing best-practices.
- Work closely with DL framework and libraries (CUDA, cuDNN, cuBLAS) team and with other relevant teams within NVIDIA that provide software build, testing, and release-related infrastructure.
Requirements:
- BS or MS degree in Computer Science, Computer Architecture, or related technical field or equivalent experience.
- 5+ years of industry experience in infrastructure engineering, DevOps.
- Strong system-level programming in languages like Python and shell scripting.
- Strong understanding of build/release systems, CI/CD, and experience with solutions like Gitlab, Github, Jenkins, etc.
- Experience with Linux system administration.
- Proficient with containerization and cluster management technologies like Docker and Kubernetes.
- Experience in build tools, including Make, Cmake.
- Experience using or deploying source code management (SCM) solutions such as GitLab, GitHub, Perforce, etc.
- Excellent problem-solving and debugging skills.
- Great teammate who can collaborate and influence in a dynamic environment with excellent interpersonal and written communication skills.
Preferred Qualifications:
- Previous experience with GPU-accelerated systems.
- Hands-on experience with DL frameworks (PyTorch, JAX, Tensorflow).
- Cluster/cloud technologies (SLURM, Lustre, k8s).
- Experience with HPC hardware systems such as compute clusters and HPC software performance benchmarking on such systems.
Compensation:
The base salary range is 180,000 USD - 339,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
You will also be eligible for equity and benefits.
NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer.
-
Senior AI Software Engineer, GenAI Framework
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeWe are seeking a highly skilled AI Software Engineer to join our team at NVIDIA. As a key member of our team, you will be responsible for crafting and implementing new model development features, optimizations, defining APIs, analyzing and tuning performance, expanding functionality coverage to build larger, coherent toolsets and libraries.Key...
-
Senior MLOps Engineer, Deep Learning Algorithms
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeAt NVIDIA, we're building software that will be used by the entire world. As a Senior MLOps Engineer, Deep Learning Algorithms, you'll work with high-class software engineers to implement a large-scale toolset that tests deep learning models and frameworks on the most powerful computers.The ability to work in a multifaceted, fast-paced environment is...
-
Senior MLOps Engineer, Deep Learning Algorithms
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeAt NVIDIA, we're building software that will be used by the entire world. As a Senior MLOps Engineer, Deep Learning Algorithms, you'll work with high-class software engineers to implement a large-scale toolset that tests deep learning models and frameworks on the most powerful computers.The ability to work in a multifaceted, fast-paced environment is...
-
Senior GenAI Solutions Architect
4 weeks ago
Santa Clara, California, United States Amazon Full timeJob DescriptionWe are seeking a highly skilled GenAI Solutions Architect to join our team at Amazon. As a GenAI Solutions Architect, you will be responsible for designing and implementing scalable GenAI solutions for our customers. You will work closely with our engineering teams to develop and deploy GenAI workloads on AWS, and will facilitate the...
-
Global GenAI Specialist, Foundation Models GTM
4 weeks ago
Santa Clara, California, United States Amazon Full timeAbout the RoleWe are seeking a highly skilled Business Development Specialist to join our Worldwide Specialist Organization (WWSO) Frameworks ML team. As a Business Development Specialist, you will be responsible for defining, building, and deploying targeted strategies to accelerate customer adoption of our GenAI services and solutions across industry...
-
Senior Solutions Architect, Global Partner Team
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeAbout the RoleNVIDIA is seeking a highly skilled Solutions Architect to join our Global Partner Team. As a key member of our team, you will be responsible for working with our Global Systems Integrator partners and AI consulting firms to develop and implement innovative solutions that leverage NVIDIA's cutting-edge technology.Key ResponsibilitiesBecoming an...
-
Senior Cloud Solutions Architect for GenAI
4 weeks ago
Santa Monica, California, United States Amazon Full timeAbout the RoleWe are seeking a highly skilled GenAI Solutions Architect to join our team. As a key member of our organization, you will be responsible for designing and implementing GenAI solutions for our media and entertainment clients.Your primary focus will be on developing and deploying GenAI models and applications using AWS GenAI services such as...
-
Senior Applied Science Manager, GenAI, AWS Kumo
4 weeks ago
Santa Clara, California, United States Amazon Web Services, Inc. Full timeAbout the RoleWe are seeking a highly skilled Senior Applied Science Manager to lead our GenAI team at AWS Kumo. As a key member of our organization, you will be responsible for developing and implementing machine learning models that drive business outcomes. Your expertise in natural language processing, deep learning, and generative AI will be instrumental...
-
Principal Scientist
4 weeks ago
Santa Clara, California, United States Amazon Full timeAbout the RoleWe are seeking a highly skilled Principal Scientist to join our team at Amazon. As a key member of our organization, you will be responsible for leading advanced research in Large Language Models (LLMs), Generative AI, and Deep Learning.Key ResponsibilitiesConduct research and develop novel algorithms, architectures, and methodologies for...
-
Senior AI/ML Engineer
1 month ago
Santa Clara, California, United States Eightfold LLC Full timeAbout Eightfold.aiWe're at the forefront of innovation in the AI-driven HR tech space, shaping the future of how organizations find, manage, and empower their talent. Our groundbreaking AI platform is revolutionizing the industry, and we're looking for exceptional engineers to join our team and drive the next wave of advancements.About the AI/ML TeamOur...
-
Senior Software Engineer
1 month ago
Santa Clara, California, United States Intel Full timeJob SummaryWe are seeking a highly skilled Senior Engineer to join our team, specializing in ML Ops and DevSecOps. This role requires a deep understanding of machine learning operations, DevSecOps practices, and the integration of security within the CI/CD pipeline.Key ResponsibilitiesML Ops:Design and implement scalable and reliable ML...
-
Senior Product Architect, HPC and AI
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob Title: Senior Product Architect, HPC and AIJob Summary: We are seeking a visionary Product Architect to join our team at NVIDIA. As a key member of our team, you will harness your infrastructure expertise to create reference designs for the world's most powerful AI clusters.Responsibilities:* Design the next-gen datacenter-scale AI infrastructure,...
-
AI Systems Engineer
4 weeks ago
Santa Clara, California, United States Meshy Full timeAbout MeshyWe are a leading 3D generative AI company headquartered in the Silicon Valley, on a mission to unleash 3D creativity.We simplify the creation of distinctive 3D assets for both professional artists and hobbyists by transforming text and images into stunning 3D models in minutes.Our global team of experts in computer graphics, AI, and art includes...
-
AI 3D Model Engineer
4 weeks ago
Santa Clara, California, United States Meshy Full timeAbout MeshyWe are a leading 3D generative AI company headquartered in the Silicon Valley, on a mission to Unleash 3D Creativity. Our platform simplifies the creation of distinctive 3D assets for both professional artists and hobbyists by transforming text and images into stunning 3D models in minutes.Our global team of 30 experts in computer graphics, AI,...
-
Senior Systems Software Engineer
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeWe are seeking a Senior Systems Software Engineer to join our TAO Toolkit Team at NVIDIA. Our team builds frameworks, services, algorithms, and tools that power the largest NVIDIA Multi-Modal Foundation Models and their customization.Key Responsibilities:Design, develop, and support a platform to access large datasets, integrating data from various...
-
Senior Software Quality Assurance Engineer
4 weeks ago
Santa Clara, California, United States Ab Ovo Full timeAbout the Role:Ab Ovo, Inc. is seeking a highly skilled Senior Software Quality Assurance Engineer to join our team. As a key member of our software development team, you will be responsible for ensuring the quality and reliability of our software products.Key Responsibilities:Design and implement automated test scripts using Java and JavaScript...
-
Senior Machine Learning Engineer
4 weeks ago
Santa Clara, California, United States Eightfold LLC Full timeAbout Eightfold.aiWe're a pioneering force in AI-driven HR tech, pushing the boundaries of innovation and shaping the future of talent management. Our cutting-edge AI platform is revolutionizing the industry, and we're seeking exceptional engineers to join our team and drive the next wave of advancements.About the AI/ML TeamOur AI/ML team is the driving...
-
Senior Database Engineer
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob Summary:NVIDIA is seeking a highly skilled Senior Database Engineer to join our team. As a Senior Database Engineer, you will be responsible for researching and developing techniques to GPU-accelerate high-performance database and ETL applications.Key Responsibilities:- Research and develop techniques to GPU-accelerate high-performance database and ETL...
-
Santa Clara, California, United States Palo Alto Networks Full timeDeveloper Relations Role OverviewThe Palo Alto Networks Application Framework is revolutionizing the future of security innovation, enabling users to access, evaluate, and adopt the most compelling new security technologies. As a key member of the Developer Relations team, you will play a crucial role in helping developers connect with the technologies,...
-
Software Quality Assurance Specialist
4 weeks ago
Santa Clara, California, United States ServiceNow Full timeTransforming Quality Assurance at ServiceNowAt ServiceNow, we're revolutionizing the way organizations work. As a Software Quality Assurance Specialist, you'll play a crucial role in ensuring the quality of our innovative AI-enhanced technology.Key Responsibilities:Maintain and enhance existing automation test frameworksCollect and report quality metrics...