Current jobs related to Senior Software Engineer, Infrastructure - San Francisco - CentML Inc.


  • San Francisco, United States Acceler8 Talent Full time

    Senior Software Engineer (AI Infrastructure / MLOps) Introduction: We are seeking a Senior Software Engineer (AI Infrastructure / MLOps) to join our team. This role offers a unique opportunity to work on cutting-edge MLOps technologies and develop large-scale web applications for data-centric AI.About the Company: Our team comprises MIT PhDs who have worked...


  • San Francisco, United States Acceler8 Talent Full time

    Senior Software Engineer (AI Infrastructure / MLOps) Introduction: We are seeking a Senior Software Engineer (AI Infrastructure / MLOps) to join our team. This role offers a unique opportunity to work on cutting-edge MLOps technologies and develop large-scale web applications for data-centric AI.About the Company: Our team comprises MIT PhDs who have worked...


  • San Francisco, United States DoorDash Full time

    About the RoleAs a Senior Android Software Engineer on the Android Infrastructure team, you will build the foundational pieces for all DoorDash Android applications. These include runtime libraries, build systems, and development tools. You will work closely with engineers, technical product managers, and engineering managers across all parts of the...


  • San Francisco, United States Discord Full time

    Senior Software Engineer - Media InfrastructureDiscord - San Francisco, CAThis position is US based only. Discord is about giving people the power to create space to find belonging in their lives. We want to make it easier for you to talk regularly with the people you care about. We want you to build genuine relationships with your friends and communities...


  • San Francisco, United States Acceler8 Talent Full time

    Senior Software Engineer (AI Infrastructure / MLOps)Location: San Francisco (3 days per week in office)Introduction:We are seeking a Senior Software Engineer (AI Infrastructure / MLOps) to join a pioneering AI startup focused on enhancing data quality for machine learning. This role offers the chance to work on large-scale web applications and tackle complex...


  • San Francisco, United States Acceler8 Talent Full time

    Senior Software Engineer (AI Infrastructure / MLOps)Location: San Francisco (3 days per week in office)Introduction:We are seeking a Senior Software Engineer (AI Infrastructure / MLOps) to join a pioneering AI startup focused on enhancing data quality for machine learning. This role offers the chance to work on large-scale web applications and tackle complex...


  • San Ramon, United States Dew Software Full time

    Job DescriptionJob DescriptionDew Software, a renowned company in the Digital Transformation space, is seeking a skilled Infrastructure Engineer to join their team. With a strong commitment to quality and excellence, Dew Software collaborates with Fortune 500 companies, supporting them in their digital transformation journey. As an Infrastructure Engineer,...


  • San Ramon, United States Dew Software Full time

    Job DescriptionJob DescriptionDew Software, a renowned company in the Digital Transformation space, is seeking a skilled Infrastructure Engineer to join their team. With a strong commitment to quality and excellence, Dew Software collaborates with Fortune 500 companies, supporting them in their digital transformation journey. As an Infrastructure Engineer,...


  • San Francisco, United States Acceler8 Talent Full time

    Job Opportunity: Senior Software Engineer (AI Infrastructure / MLOps)We are seeking a highly skilled Senior Software Engineer to join our innovative team and work on cutting-edge data-centric AI solutions. As a Senior Software Engineer, you will have the opportunity to develop large-scale web applications and tackle challenging problems related to model...


  • San Francisco, United States Delta System and Software Inc. Full time

    Job DescriptionJob DescriptionJob Title: Senior IT Infrastructure Contractor / Client : Nektar Therapeutics / Rate : $ 100-105/hr c2cLocation: SFO Bay area, CA onsite / Hybrid- NO REMOTEDuration: 6-12 Months Job Description: Mandatory skills: DevOps person with more broad experience with O365, AD, Applications deployment, VMWare etcThis is a key position...


  • San Francisco, California, United States Seesaw Full time

    Position Overview:Seesaw is on the lookout for a skilled back-end Software Engineer with a focus on infrastructure to enhance our Core Platform Engineering team. In this role, you will be instrumental in establishing the essential framework of our platform, enabling various product teams to efficiently deliver outstanding user experiences at scale. Your...


  • San Francisco, California, United States Seesaw Full time

    Position Overview:Seesaw is in search of a skilled back-end Software Engineer with a focus on infrastructure to become a vital member of our Core Platform Engineering division. In this capacity, you will significantly influence the underlying architecture of our platform, building the essential layers that empower our various product teams to efficiently...


  • San Francisco, California, United States Seesaw Full time

    Position Overview:Seesaw is in search of a skilled back-end Software Engineer with a focus on infrastructure to enhance our Core Platform Engineering division. In this role, you will be instrumental in establishing the core architecture of our platform, enabling various product teams to efficiently deliver outstanding user experiences at scale. Your duties...


  • San Francisco, California, United States Anthropic Limited Full time

    Position Overview:Anthropic Limited is in search of skilled and seasoned Infrastructure Engineers to enhance our capabilities in developing, scaling, and maintaining innovative AI systems. By becoming part of our Infrastructure division, you will engage with pioneering AI technologies and play a significant role in advancing frontier models, furthering...


  • San Francisco, California, United States Anthropic Limited Full time

    Position Overview:Anthropic Limited is on the lookout for skilled and seasoned Infrastructure Engineers to enhance our efforts in developing, scaling, and maintaining innovative AI systems. This role presents an exciting opportunity to engage with advanced AI technologies and contribute to the evolution of state-of-the-art models, aligning with Anthropic's...


  • San Francisco, California, United States Anthropic Limited Full time

    Position Overview:Anthropic Limited is on the lookout for skilled and seasoned Infrastructure Engineers to enhance our efforts in the development, scaling, and upkeep of our advanced AI systems. As part of the Infrastructure team, you will engage with pioneering AI technologies and play a vital role in advancing frontier models, aligning with Anthropic's...


  • San Francisco, California, United States Anthropic Limited Full time

    Position Overview:Anthropic Limited is in search of skilled and seasoned Infrastructure Engineers to enhance our capabilities in developing, scaling, and maintaining advanced AI systems. By becoming a part of our Infrastructure division, you will engage with pioneering AI technologies and play a vital role in advancing frontier models, aligning with...


  • San Francisco, California, United States Anthropic Limited Full time

    Position Overview:Anthropic Limited is on the lookout for skilled and seasoned Infrastructure Engineers to enhance our capabilities in developing, scaling, and maintaining innovative AI systems. As part of our Infrastructure division, you will engage with pioneering AI technologies and play a vital role in advancing our mission to establish safe and...


  • San Francisco, United States Orb Full time

    Mission Orb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage-whether that's through seats, consumption, feature limits, or usage-based tiers. Orb brings that opportunity to every software company. We are reimagining...


  • San Francisco, United States Orb Full time

    Mission Orb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage-whether that's through seats, consumption, feature limits, or usage-based tiers. Orb brings that opportunity to every software company. We are reimagining...

Senior Software Engineer, Infrastructure

2 months ago


San Francisco, United States CentML Inc. Full time

About Us We believe AI will fundamentally transform how people live and work. CentML's mission is to massively reduce the cost of developing and deploying ML models so we can enable anyone to harness the power of AI and everyone to benefit from its potential. Our founding team is made up of experts in AI, compilers, and ML hardware and has led efforts at companies like Amazon, Google, Microsoft Research, Nvidia, Intel, Qualcomm, and IBM. Our co-founder and CEO, Gennady Pekhimenko, is a world-renowned expert in ML systems who holds multiple academic and industry research awards from Google, Amazon, Facebook, and VMware. Position Overview: We are seeking a highly motivated and skilled senior infrastructure engineer to join our team in a key role focused on designing, developing, and maintaining the CentML platform that offers a cost effective infrastructure for serving and training large scale machine learning models. As an infrastructure engineer, you will be responsible for laying out the design of a deployment infrastructure for ML training and inference jobs over GPU clusters that spans across multiple cloud service providers like AWS, GCP, Azure, Coreweave, and OCI. You should also be responsible for leading a team of engineers and building a scalable, performant, and reliable platform, enabling our customers to seamlessly access and utilize a comprehensive suite of ML services that we offer. Responsibilities

Design and lead the development of the deployment infrastructure of the CentML platform. The deployment infrastructure manages the hardware resources necessary to deploy the ML training and inference applications. Implementing GPU cluster scheduling solutions for large scale ML training and inference workloads to efficiently utilize the hardware resources in the GPU cluster. Communicate with our product teams and define new features and goals for improving the CentML platform. Qualifications

4+ years of experience working with containerized deployment systems (e.g, kubernetes, openshift, terraform etc.). A big plus if you have contributed to kubernetes and have expertise in container runtime technologies like docker engine, containerd, or CRI-O Experience with deploying and managing cloud infrastructure on AWS, GCP, Azure Past experience in building GPU clusters for large scale ML training and inference is desirable. Knowledge in GPU architecture and Nvidia GPU virtualization technologies is highly desirable. Strong coding skills in languages like Python, Java, Go, and/or C/C++. Benefits & Perks - An open and inclusive culture and work environment - Fully stocked kitchen at the office - Full health and dental benefits - Parental Leave top-up for 6 months - Continuous education budget - Generous vacation - we're not saying unlimited, but if you need extra time to recharge, just ask At CentML, we celebrate our differences and value cultivating an inclusive environment for all. We welcome applications of all kinds and are committed to providing an equal opportunity process.

#J-18808-Ljbffr