Platform ML Engineering Manager, Training
4 weeks ago
The Platform ML team builds the ML side of our state-of-the-art internal training framework used to train our cutting-edge models. We work on distributed model execution as well as the interfaces and implementation for model code, training, and inference.
Our priorities are to maximize training throughput (how quickly we can train a new model) and researcher throughput (how quickly we can develop new models) with the goal of accelerating progress towards AGI. We frequently collaborate with other teams to speed up the development of new capabilities.
About the Role
We are looking for an experienced engineering manager to help lead critical work on our shared internal training stack and grow the team. Our training stack is primarily by teams in Research Platform.
In this role, you will:
- Get SOTA throughput for our most important research models.
- Reduce the time it takes to try out new research ideas for training new models.
- Collaborate closely with researchers and other systems engineers to maximize the benefits of our shared internal training stack.
- Hire world-class AI systems engineers in one of the most competitive hiring markets.
- Coordinate the training needs of OpenAI's research teams.
- Create a diverse, equitable, and inclusive culture that makes all feel welcome while enabling radical candor and the challenging of group think.
- Have 3+ years of experience in engineering management and 7+ years as an IC working with high scale distributed systems and ML systems.
- Have experience with ML systems, particularly high scale distributed training or inference for modern LLMs.
- Have familiarity with the latest AI research and working knowledge of how these systems are efficiently implemented.
- Care deeply about diversity, equity, and inclusion, and have a track record of building inclusive teams.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.
We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.
OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement
For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.
We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.
OpenAI Global Applicant Privacy Policy
At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
-
Software Engineer, ML Infrastructure
3 weeks ago
San Francisco, United States Scale AI, Inc. Full timeAs a software engineer on the ML Infrastructure team, you will work on developing the platform for orchestrating post-training and model evaluation jobs. At Scale, we are constantly developing new data sources and running experiments to understand their impact on ML models. To support this effort, we are looking for engineers who are comfortable navigating...
-
Principal Product Manager, ML Platform
2 weeks ago
San Francisco, United States The Product Folks Full timeAdobe is the global leader in digital media and digital marketing solutions. Our creative, marketing and document solutions empower everyone – from emerging artists to global brands – to bring digital creations to life and deliver immersive, compelling experiences to the right person at the right moment for the best results. In short, Adobe is...
-
Platform ML Engineering Manager, Model Graph
2 weeks ago
San Francisco, United States Openai Full timeAbout the Team The Platform ML team builds the ML side of our state-of-the-art internal training framework used to train our cutting-edge models. We work on distributed model execution as well as the interfaces and implementation for model code, training, and inference. Our priorities are to maximize training throughput (how quickly we can train a new model)...
-
ML Platform Engineer
4 weeks ago
San Francisco, United States Abridge Al, Inc Full timeAbridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most-their patients. Our enterprise-grade technology transforms patient-clinician conversations into...
-
Senior AI/ML Platform Manager
1 month ago
San Jose, California, United States PayPal Full timeAt PayPal, we're revolutionizing commerce globally, and we need a Senior AI/ML Platform Manager to help us scale our AI/ML infrastructure and platform.We're looking for a strong Senior Product Manager with a deep understanding of the AI/ML Platform stack and a strong business acumen to partner with Data Scientists and ML Engineers in delivering a...
-
Software Engineer
4 months ago
San Francisco, United States CentML Full timeAbout Us We believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential. Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts at...
-
Managing Director, Platform Sales
6 days ago
San Mateo, United States Snowflake Computing Full timeBuild the future of the AI Data Cloud. Join the Snowflake team. Snowflake is seeking an accomplished Managing Director, Platform Sales, AI & ML to lead and drive the sales strategy for our AI & ML workload. As a senior leader within the Platform Sales team, you will be responsible for aligning our go-to-market strategies with the business objectives for AI...
-
Senior Manager, AI/ML Platform
3 weeks ago
San Jose, United States PayPal Full timeThe CompanyPayPal has been revolutionizing commerce globally for more than 25 years. Creating innovative experiences that make moving money, selling, and shopping simple, personalized, and secure, PayPal empowers consumers and businesses in approximately 200 markets to join and thrive in the global economy.We operate a global, two-sided network at scale that...
-
Senior Geospatial AI/ML Engineer
18 hours ago
San Francisco, United States Wherobots Inc Full timeWe are looking for passionate, skilled, and experienced ML engineers and data scientists to join Wherobots’ dynamic team in building the distributed geospatial cloud products of the future. Wherobots offers a fully-managed cloud platform designed to simplify geospatial analytics and AI applications. Our platform empowers customers to analyze massive...
-
Senior Geospatial AI/ML Engineer
4 days ago
San Francisco, United States Wherobots Full timeWe are looking for passionate, skilled, and experienced ML engineers and data scientists to join Wherobots' dynamic team in building the distributed geospatial cloud products of the future. Wherobots offers a fully-managed cloud platform designed to simplify geospatial analytics and AI applications. Our platform empowers customers to analyze massive amounts...
-
ML Engineer
4 weeks ago
San Francisco, United States LOG10 LLC Full timeAbout Log10 Inc Log10 is addressing the challenges around reliability and consistency of LLM-powered applications via a platform that provides AI-powered evaluations, fine-tuning and debugging tools. We are currently a team of 8 having previously worked in AI and infra roles at companies such as Intel, MosaicML, Adobe, Docker, PostEra, Starburst and Second...
-
Senior Data and ML Infrastructure Engineer
2 weeks ago
San Francisco, California, United States Unity Technologies Full timeAbout the RoleWe're seeking a skilled Senior Data and ML Infrastructure Engineer to join our team at Unity. As a key member of our Data & ML Platform team, you will design and optimize large-scale data platforms and machine learning infrastructure systems for efficiency, reliability, and cost-effectiveness.Key Responsibilities:Design and optimize large-scale...
-
Platform Engineer
3 weeks ago
San Francisco, United States Eventualcomputing Full timeAbout EventualEventual is a data platform that helps data scientists and engineers build data applications across ETL, analytics and ML/AI.OUR PRODUCT IS OPEN-SOURCE AND USED AT ENTERPRISE SCALEOur distributed data engine Daft is open-sourced and runs on 800k CPU cores daily. This is more compute than Frontier, the world's largest supercomputer!Today's data...
-
Machine Learning Engineer, GenAI Platform
4 weeks ago
San Francisco, United States Magical Tome Full timeAbout Tome Tome is a unified platform for enterprise sellers and account managers. We use state-of-the-art models to simplify complex research and strategic planning for sellers. Tome can surface the most actionable knowledge about a customer from within internal systems as well as from public information across thousands of data sources. Our system is tuned...
-
San Francisco, United States Discord Full timeDiscord is used by over 200 million people every month for many different reasons, but there’s one thing that nearly everyone does on our platform: play video games. Over 90% of our users play games, spending a combined 1.5 billion hours playing thousands of unique titles on Discord each month. Discord plays a uniquely important role in the future of...
-
ML Operations Engineer
1 week ago
San Francisco, United States RemoteWorker CA Full timeCompany Overview: Welcome to the forefront of machine learning operations! At our company, we're driving the next wave of AI revolution through cutting-edge ML operations technologies. Our mission is to develop scalable and reliable ML systems that empower businesses and revolutionize industries. Join us and be part of a dynamic team committed to pushing the...
-
San Francisco, United States Discord Full timeDiscord is used by over 200 million people every month for many different reasons, but there's one thing that nearly everyone does on our platform: play video games. Over 90% of our users play games, spending a combined 1.5 billion hours playing thousands of unique titles on Discord each month. Discord plays a uniquely important role in the future of gaming....
-
San Francisco, United States Discord Full timeDiscord is used by over 200 million people every month for many different reasons, but there’s one thing that nearly everyone does on our platform: play video games. Over 90% of our users play games, spending a combined 1.5 billion hours playing thousands of unique titles on Discord each month. Discord plays a uniquely important role in the future of...
-
Data Platform Engineer
5 hours ago
San Francisco, United States Robust Intelligence Full timeRobust Intelligence's mission is to eliminate AI Risk. As the world increasingly adopts AI into automated decision processes, we inherit great risk. Our flagship product is built to be integrated with existing AI systems to enumerate and eliminate risks caused by unintentional and intentional (adversarial) failure modes. With Generative AI becoming...
-
ML Infrastructure Engineer
3 weeks ago
San Francisco, United States Abridge AI Inc. Full timeAbridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients.Our enterprise-grade technology transforms patient-clinician conversations into...