Platform ML Engineering Manager, Training
2 months ago
The Platform ML team builds the ML side of our state-of-the-art internal training framework used to train our cutting-edge models. We work on distributed model execution as well as the interfaces and implementation for model code, training, and inference.
Our priorities are to maximize training throughput (how quickly we can train a new model) and researcher throughput (how quickly we can develop new models) with the goal of accelerating progress towards AGI. We frequently collaborate with other teams to speed up the development of new capabilities.
About the Role
We are looking for an experienced engineering manager to help lead critical work on our shared internal training stack and grow the team. Our training stack is primarily by teams in Research Platform.
In this role, you will:
- Get SOTA throughput for our most important research models.
- Reduce the time it takes to try out new research ideas for training new models.
- Collaborate closely with researchers and other systems engineers to maximize the benefits of our shared internal training stack.
- Hire world-class AI systems engineers in one of the most competitive hiring markets.
- Coordinate the training needs of OpenAI's research teams.
- Create a diverse, equitable, and inclusive culture that makes all feel welcome while enabling radical candor and the challenging of group think.
- Have 3+ years of experience in engineering management and 7+ years as an IC working with high scale distributed systems and ML systems.
- Have experience with ML systems, particularly high scale distributed training or inference for modern LLMs.
- Have familiarity with the latest AI research and working knowledge of how these systems are efficiently implemented.
- Care deeply about diversity, equity, and inclusion, and have a track record of building inclusive teams.
About OpenAI
OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.
We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or any other legally protected status.
OpenAI Affirmative Action and Equal Employment Opportunity Policy Statement
For US Based Candidates: Pursuant to the San Francisco Fair Chance Ordinance, we will consider qualified applicants with arrest and conviction records.
We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.
OpenAI Global Applicant Privacy Policy
At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.
-
Platform ML Engineering Manager, Training
1 month ago
San Francisco, United States OpenAI Full timeAbout the Team The Platform ML team builds the ML side of our state-of-the-art internal training framework used to train our cutting-edge models. We work on distributed model execution as well as the interfaces and implementation for model code, training, and inference. Our priorities are to maximize training throughput (how quickly we can train a new model)...
-
Platform Engineering Lead
4 days ago
San Francisco, California, United States Fieldguide Full timeRole ResponsibilitiesDesign and implement infrastructure for ML model management, including training, deployment, and monitoringBuild and maintain platforms for running ML algorithms at scaleDevelop systems for A/B testing, performance monitoring, and continuous model trainingImplement best practices for MLOps, including version control for models and...
-
Software Engineer, ML Infrastructure
2 months ago
San Francisco, United States Scale AI, Inc. Full timeAs a software engineer on the ML Infrastructure team, you will work on developing the platform for orchestrating post-training and model evaluation jobs. At Scale, we are constantly developing new data sources and running experiments to understand their impact on ML models. To support this effort, we are looking for engineers who are comfortable navigating...
-
Principal Product Manager, ML Platform
2 months ago
San Francisco, United States The Product Folks Full timeAdobe is the global leader in digital media and digital marketing solutions. Our creative, marketing and document solutions empower everyone – from emerging artists to global brands – to bring digital creations to life and deliver immersive, compelling experiences to the right person at the right moment for the best results. In short, Adobe is...
-
Platform ML Engineering Manager, Model Graph
2 months ago
San Francisco, United States Openai Full timeAbout the Team The Platform ML team builds the ML side of our state-of-the-art internal training framework used to train our cutting-edge models. We work on distributed model execution as well as the interfaces and implementation for model code, training, and inference. Our priorities are to maximize training throughput (how quickly we can train a new model)...
-
Senior Product Manager, AI/ML Platform
1 week ago
San Francisco, United States Disability Solutions Full timeThe Onyx Research Data Tech organization is GSK's Research data ecosystem which has the capability to bring together, analyze, and power the exploration of data at scale. We partner with scientists across GSK to define and understand their challenges and develop tailored solutions that meet their needs. The goal is to ensure scientists have the right data...
-
ML Platform Engineer
1 month ago
San Francisco, United States Abridge Al, Inc Full timeAbridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most-their patients. Our enterprise-grade technology transforms patient-clinician conversations into...
-
ML Platform Engineer
4 days ago
San Francisco, United States Abridge Full timeAbridge was founded in 2018 with the mission of powering deeper understanding in healthcare. Our AI-powered platform was purpose-built for medical conversations, improving clinical documentation efficiencies while enabling clinicians to focus on what matters most—their patients.Our enterprise-grade technology transforms patient-clinician conversations into...
-
AI/ML Platform Operations Director
7 days ago
San Francisco, California, United States Capital One Full timeCapital One: Revolutionizing Financial Services with AI and Machine LearningWe are at the forefront of harnessing the power of artificial intelligence and machine learning to transform the financial services industry. Our mission is to deliver personalized, real-time experiences for our customers by creating trustworthy, reliable, and human-in-the-loop...
-
Principal Product Manager, ML Platform
4 weeks ago
San Francisco, CA, United States The Product Folks Full timeAdobe is the global leader in digital media and digital marketing solutions. Our creative, marketing and document solutions empower everyone – from emerging artists to global brands – to bring digital creations to life and deliver immersive, compelling experiences to the right person at the right moment for the best results. In short, Adobe is...
-
Senior ML Infrastructure Engineer
2 weeks ago
San Francisco, California, United States Fieldguide Full timeAbout Us: Fieldguide is a pioneering company that's revolutionizing the audit and advisory industry by leveraging cutting-edge Machine Learning (ML) technology. As a Senior Platform Engineer, Machine Learning, you'll be instrumental in building and maintaining the infrastructure that powers our ML solutions, enabling us to deliver impactful results to our...
-
San Francisco, California, United States Capital One Full timeCapital One is seeking an experienced engineering leader to lead our AI and ML platform. This role will involve managing and growing a team of software engineers, defining strategy and roadmap, and driving delivery of converged interaction patterns for our enterprise AI and ML platforms. The ideal candidate will have strong technical acumen, excellent...
-
AI/ML Infrastructure Engineer
1 week ago
San Francisco, California, United States Magical Tome Full timeAbout Magical TomeTome is a unified platform for enterprise sellers and account managers. Our mission is to simplify complex research and strategic planning for sellers by leveraging state-of-the-art models.We use our expertise in AI/ML to surface the most actionable knowledge about a customer from within internal systems as well as from public information...
-
Senior AI/ML Engineer
2 weeks ago
San Francisco, California, United States Magical Tome Full timeAbout Magical TomeMagical Tome is a unified platform for enterprise sellers and account managers. We use cutting-edge models to simplify complex research and strategic planning for sellers. Our system is tuned and customized by a team of experienced sellers, engineers, and researchers. We design and build Magical Tome in close partnership with our early...
-
AI/ML Platform Developer
5 days ago
San Francisco, California, United States Fieldguide Full time**Company Overview:**Fieldguide is a rapidly scaling Series B-stage company that's revolutionizing the audit and advisory industry.We're seeking a talented Senior Machine Learning Engineer to join our team and help drive innovation in ML-driven features.About the Role:Develop and deploy ML-driven features that enhance our platform's capabilitiesCollaborate...
-
Senior Geospatial AI/ML Engineer
3 weeks ago
San Francisco, United States Wherobots Inc Full timeWe are looking for passionate, skilled, and experienced ML engineers and data scientists to join Wherobots dynamic team in building the distributed geospatial cloud products of the future. Wherobots offers a fully-managed cloud platform designed to simplify geospatial analytics and AI applications. Our platform empowers customers to analyze massive amounts...
-
AI Platform Engineer
4 days ago
San Francisco, California, United States Databricks Full timeAbout Mosaic AIMosaic AI is a pioneering company in the field of machine learning, founded by a group of experienced engineers and researchers in late 2020. Our mission is to empower organizations to securely fine-tune, train, and deploy custom AI models on their own data, ensuring maximum security and control. With our platform, users can leverage all major...
-
Software Engineer
5 months ago
San Francisco, United States CentML Full timeAbout Us We believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential. Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts at...
-
Software Engineer
1 month ago
San Francisco, United States CentML Full timeAbout Us We believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential. Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts at...
-
San Francisco, California, United States Unity Full timeWelcome to Unity, the world's leading platform of tools for creators to build and grow real-time games, apps, and experiences across multiple platforms. As a highly skilled data and machine learning (ML) infrastructure engineer, you will play a crucial role in designing and optimizing large-scale data platforms and ML infrastructure systems for efficiency,...