ML Infrastructure Deployment Specialist
3 days ago
About CentML
We believe AI will fundamentally transform how people live and work. Our mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential.
Our founding team is made up of experts in AI, compilers, and ML hardware with extensive industry experience at Amazon, Google, Microsoft Research, Nvidia, Intel, Qualcomm, and IBM.
We are seeking a highly motivated and skilled senior infrastructure engineer to join our team in designing, developing, and maintaining the CentML platform. As an infrastructure engineer, you will be responsible for laying out the design of a deployment infrastructure for ML training and inference jobs over GPU clusters spanning multiple cloud service providers.
Key Responsibilities- Design and lead the development of the deployment infrastructure of the CentML platform, managing the hardware resources necessary to deploy ML training and inference applications.
- Implement GPU cluster scheduling solutions for large-scale ML training and inference workloads to efficiently utilize the hardware resources in the GPU cluster.
- Collaborate with product teams to define new features and goals for improving the CentML platform.
Required Qualifications
- 4+ years of experience working with containerized deployment systems (e.g., kubernetes, openshift, terraform etc.).
- A big plus if you have contributed to kubernetes and have expertise in container runtime technologies like docker engine, containerd, or CRI-O.
- Experience with deploying and managing cloud infrastructure on AWS, GCP, Azure.
- Past experience in building GPU clusters for large-scale ML training and inference is desirable.
- Knowledge in GPU architecture and Nvidia GPU virtualization technologies is highly desirable.
- Strong coding skills in languages like Python, Java, Go, and/or C/C++.
Benefits & Perks
An open and inclusive work environment.
Employee stock options.
Best-in-class medical and dental benefits.
Parental Leave top-up for 6 months.
Professional development budget.
Flexible vacation time to promote a healthy work-life blend.
We are an equal opportunity employer and value diversity at our company. We do not discriminate based on race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability, and any other protected ground of discrimination under applicable human rights legislation.
CentML strives to respect the dignity and independence of people with disabilities and is committed to giving them the same opportunity to succeed as all other employees.
Inclusiveness is core to our culture at CentML, and we strive to ensure you get the most from your interview experience. CentML makes reasonable accommodations for applicants with disabilities. If a reasonable accommodation is needed to participate in the job application or interview process, please reach out to the Talent team.
-
Scalable ML Deployment Specialist
1 week ago
San Francisco, California, United States ZipRecruiter Full timeAbout the RoleWe are seeking an experienced Machine Learning Systems Engineer to join our team. As an ML Systems Engineer at Abridge, you will be responsible for scaling and deploying machine learning models to handle increasing traffic demands and integrating them with various platforms.Architect, design, and implement ML software systems for deploying and...
-
AI Infrastructure Deployment Specialist
4 weeks ago
San Francisco, California, United States CentML Full timeAbout Our MissionWe believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential.Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts...
-
San Francisco, California, United States Unity Technologies Full timeWe are seeking a talented Senior Data and ML Infrastructure Engineer to join our team at Unity Technologies. This role is responsible for designing and optimizing large-scale data platforms and machine learning infrastructure systems for efficiency, reliability, and cost-effectiveness.Job OverviewUnity is the world's leading platform of tools for creators to...
-
San Francisco, California, United States Unity Full timeWelcome to Unity, the world's leading platform of tools for creators to build and grow real-time games, apps, and experiences across multiple platforms. As a highly skilled data and machine learning (ML) infrastructure engineer, you will play a crucial role in designing and optimizing large-scale data platforms and ML infrastructure systems for efficiency,...
-
Senior ML Infrastructure Engineer
2 weeks ago
San Francisco, California, United States Fieldguide Full timeAbout Us: Fieldguide is a pioneering company that's revolutionizing the audit and advisory industry by leveraging cutting-edge Machine Learning (ML) technology. As a Senior Platform Engineer, Machine Learning, you'll be instrumental in building and maintaining the infrastructure that powers our ML solutions, enabling us to deliver impactful results to our...
-
Senior ML Infrastructure Architect
1 week ago
San Francisco, California, United States Delphina Full timeAbout DelphinaWe are on a mission to revolutionize the way data scientists work. Our vision is to empower teams to build powerful machine learning models quickly and efficiently, without the pain points associated with traditional tools.As a Founding ML Infrastructure Engineer at Delphina, you will be part of a team that has previously led large data science...
-
AI Infrastructure Specialist
5 days ago
San Francisco, California, United States Unreal Gigs Full timeUnreal Gigs is seeking an experienced AI Infrastructure Specialist to design, automate, and manage robust machine learning pipelines. Job OverviewThis role involves building scalable infrastructure for AI workloads, automating workflows, and developing tools that enable continuous integration and continuous delivery (CI/CD) of ML...
-
ML Infrastructure Architect
1 week ago
San Francisco, California, United States Harnham Full time**Transforming the Future of Live Entertainment**Harnham is a leading tech company that's changing the game by creating seamless digital solutions for millions of users.We're looking for a highly skilled MACHINE LEARNING PLATFORM ENGINEER to design and build infrastructure that accelerates the ML lifecycle, enabling scalable, reliable systems for critical...
-
Cloud Native ML Specialist
2 weeks ago
San Francisco, California, United States Abridge Al, Inc Full timeAbout the JobWe are seeking an experienced Machine Learning Systems Engineer to join our team. As an ML Systems Engineer at Abridge, you will be responsible for scaling and deploying machine learning models to handle increasing traffic demands and integrating them with various platforms.You will play a pivotal role in building a scalable infrastructure that...
-
AI Infrastructure Specialist
2 weeks ago
San Francisco, California, United States ZipRecruiter Full timeJob DescriptionWe're looking for a highly skilled Ai Infrastructure Specialist to join our team of engineers and data scientists. As an AI Infrastructure Specialist, you'll play a key role in designing, building, and optimizing our AI infrastructure to support the needs of our organization.About the RoleDesign and Build Infrastructure: Design and build...
-
San Francisco, California, United States Unity Full timeJob OverviewWe are seeking a Senior Data Engineer and Infrastructure Specialist to join our Data & ML Platform team at Unity.About the RoleIn this position, you will design and optimize large-scale data platforms and machine learning infrastructure systems for efficiency, reliability, and cost-effectiveness. You will also lead improvements in infrastructure...
-
Senior Systems Engineer
3 days ago
San Francisco, California, United States CentML Full timeAbout CentMLWe're a cutting-edge technology company dedicated to revolutionizing the field of artificial intelligence. Our goal is to make AI more accessible and affordable for everyone.Our TeamOur team consists of world-renowned experts in AI, compilers, and ML hardware who have led efforts at top tech companies like Amazon, Google, and Microsoft.Job...
-
San Francisco, California, United States Unreal Gigs Full timeJob OverviewWe are seeking an experienced Artificial Intelligence Infrastructure Specialist to join our team at Unreal Gigs. As a key member of our infrastructure team, you will play a crucial role in designing, building, and optimizing our machine learning infrastructure to support the needs of our organization.Key Responsibilities:Machine Learning...
-
Cloud Software Engineer
4 weeks ago
San Francisco, California, United States University of California - San Francisco Campus and Health Full timeJob SummaryThe senior software engineer will lead the development, implementation, and maintenance of computing and data infrastructure to support the deployment and monitoring of Machine Learning (ML) and generative Artificial Intelligence (AI) tools at UCSF Health.This includes leading the Health IT Platform for Advanced Computing (HIPAC), a cloud...
-
AI Healthcare Systems Deployment Engineer
3 days ago
San Francisco, California, United States Abridge Full timeAbout AbridgeAbridge is a trailblazing, mission-driven organization that is revolutionizing the healthcare industry through AI-powered technology.Opportunities and BenefitsWe offer a unique opportunity to work with talented individuals, have ownership and impact at a high-growth startup, and enjoy a range of benefits including flexible/ unlimited PTO,...
-
AI Cloud Infrastructure Specialist
4 weeks ago
San Francisco, California, United States WEX, Inc. Full timeAbout WEX, Inc.We're a global commerce platform and payments technology company forging the way in a rapidly changing environment. Our mission is to simplify the business of doing business for customers, freeing them to focus on what matters most. We're committed to building a consistent world-class user experience across our products and services,...
-
AI Infrastructure Architect
4 weeks ago
San Francisco, California, United States Abridge AI Inc. Full timeAbridge AI Inc. is a pioneering force in healthcare technology, utilizing artificial intelligence to empower deeper understanding and improve clinical documentation efficiency.Role OverviewWe are seeking an exceptional ML Systems Engineer to join our team, responsible for scaling and deploying machine learning models to handle increasing traffic demands and...
-
Cloud ML System Specialist
2 weeks ago
San Francisco, California, United States Fieldguide Full timeThe Role: Design and implement infrastructure for ML model management, including training, deployment, and monitoring Build and maintain platforms for running ML algorithms at scale Develop systems for A/B testing, performance monitoring, and continuous model training About You:You have 3-4 years of experience in software engineering, DevOps, or a related...
-
Cloud AI Infrastructure Architect
1 week ago
San Francisco, California, United States WEX Full timeOverview:Achieve technical excellence in AI infrastructure development with WEX, a leading global commerce platform and payments technology company. We're seeking an experienced Staff Cloud Engineer to spearhead our AI infrastructure initiatives, leveraging cloud-based solutions and cutting-edge technologies.About the Role:This is an exceptional opportunity...
-
San Francisco, California, United States WEX, Inc. Full timeAbout WEX, Inc.WEX is an innovative global commerce platform and payments technology company that aims to simplify the business of doing business for customers. We are on a mission to create a consistent world-class user experience across our products and services, leveraging customer-focused innovations in big data, AI, and Risk.We are looking for a highly...