AI Infrastructure Engineering Manager

3 weeks ago


San Francisco, California, United States ZipRecruiter Full time

**Job Summary**

We are seeking a highly skilled Senior Machine Learning Infrastructure Engineer to lead the design, development, and optimization of our machine learning infrastructure. As a key member of our team, you will work on challenging projects, from building scalable data pipelines to deploying and managing machine learning models in production environments.

**Key Responsibilities:**

  • Design and architect scalable and reliable infrastructure solutions to support machine learning workflows, including data ingestion, model training, evaluation, and deployment.
  • Develop and maintain data pipelines to ingest, preprocess, and transform data for training machine learning models, ensuring data quality, integrity, and scalability.
  • Build and optimize infrastructure for training machine learning models at scale, leveraging distributed computing frameworks and accelerators for performance and efficiency.
  • Implement monitoring and logging solutions to track the performance and health of machine learning infrastructure and models, proactively identifying and resolving issues.
  • Develop automation and orchestration tools to streamline machine learning workflows, reducing manual intervention and improving operational efficiency.

**Requirements:

  • Bachelor's degree or higher in Computer Science, Engineering, Mathematics, or related field.
  • 5+ years of experience in infrastructure engineering, with a focus on machine learning infrastructure.
  • Strong programming skills in languages such as Python, Java, or Scala, with experience in distributed computing frameworks like Apache Spark or TensorFlow.
  • Experience with containerization technologies such as Docker and container orchestration platforms such as Kubernetes.
  • Strong understanding of machine learning concepts and techniques, with experience deploying and managing machine learning models in production environments.

**Benefits:**

  • Estimated salary range: $170,000 - $230,000 per year.
  • Comprehensive health, dental, and vision insurance plans.
  • Flexible work hours and remote work options.
  • Generous vacation and paid time off.
  • Professional development opportunities, including access to training programs, conferences, and workshops.
  • Vibrant and inclusive company culture with opportunities for growth and advancement.


  • San Francisco, California, United States Naptha AI Full time

    About Naptha AIWe are seeking exceptional Software Engineering interns to join Naptha AI and contribute to building the future of AI agent infrastructure.This internship offers hands-on experience working with frontier AI technology, backed by industry veterans and technical leaders through NVIDIA Inception, Google for Startups, and Microsoft for Startups.As...


  • San Francisco, California, United States Magic AI Full time

    Magic AI is a pioneering company building safe Artificial General Intelligence (AGI) to accelerate humanity's progress on the world's most pressing challenges. Our mission is to develop AGI that complements human capabilities, rather than replacing them.The Supercomputing Platform & Infrastructure team at Magic AI is responsible for designing and...


  • San Francisco, California, United States Magic AI Full time

    Company OverviewMagic AI is a cutting-edge technology company dedicated to building safe Artificial General Intelligence (AGI) that accelerates humanity's progress on the world's most important problems.We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than...


  • San Mateo, California, United States Lumino Ai Full time

    Lumino Ai is a leading developer of innovative AI solutions. We're currently seeking a highly skilled Machine Learning Engineer to join our team. This is an excellent opportunity to contribute to the development of cutting-edge AI technologies and work with a talented group of professionals who share your passion for innovation.About the Role:We're looking...


  • San Francisco, California, United States Together AI Full time

    About the Role">We are seeking a highly skilled DevOps Engineer to join our team at Together AI. As an MLOps engineer, you will develop systems and APIs that enable our customers to perform inference and fine-tune LLMs.">Key Responsibilities">Implement runtime systems that perform inference at scale using AI/ML models from simple models up to the largest...


  • San Francisco, California, United States Together AI Full time

    Company Overview:At Together AI, we believe open and transparent AI systems will drive innovation and create the best outcomes for society. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama.Job Description:We are seeking an experienced MLOps engineer to develop systems and APIs that enable our customers...

  • Infrastructure Lead

    2 weeks ago


    San Francisco, California, United States Naptha AI Full time

    Naptha AI is looking for a talented Cloud-Scale Distributed Systems Engineer to lead the development of our AI infrastructure. You will be responsible for designing and implementing scalable infrastructure for massive agent networks, architecting systems for efficient agent communication and coordination, and building robust, distributed systems for agent...


  • San Francisco, California, United States Abridge AI Inc. Full time

    Abridge AI Inc. is a pioneering force in healthcare technology, utilizing artificial intelligence to empower deeper understanding and improve clinical documentation efficiency.Role OverviewWe are seeking an exceptional ML Systems Engineer to join our team, responsible for scaling and deploying machine learning models to handle increasing traffic demands and...


  • San Francisco, California, United States Naptha AI Full time

    Job OverviewWe are seeking a highly skilled AI Infrastructure Strategist to help shape the future of AI agent infrastructure at Naptha AI. As an advisor, you will leverage your expertise to guide our journey in building the foundational infrastructure for the next wave of AI companies.


  • San Francisco, California, United States Naptha AI Full time

    Your Expertise MattersWe invite applications from individuals with diverse backgrounds and experiences who believe they can add value to our mission of building the infrastructure for the next generation of AI systems. As an Advisor, you will have the opportunity to shape the future of AI while working with a team backed by industry veterans and technical...


  • San Francisco, California, United States Naptha AI Full time

    About Naptha AIWe are seeking a skilled professional to shape the future of AI agent development and build relationships with frontier AI developers. This is a rare opportunity to influence the direction of AI infrastructure at a massive scale, backed by industry veterans and technical leaders.


  • San Francisco, California, United States Relyance AI Full time

    Job Summary:We're seeking an exceptional Senior Software Engineer - ML to lead the development of our AI solutions. As a key member of the team, you'll collaborate with cross-functional stakeholders to design and build scalable, high-performance systems that meet our customers' needs. Your expertise in machine learning and natural language processing will be...


  • San Francisco, California, United States Abridge AI Inc. Full time

    Abridge AI Inc. is a trailblazing organization that empowers deeper understanding in healthcare through innovative AI solutions. Our mission-driven approach has led to the development of industry-leading natural language understanding products.Job OverviewWe are seeking a highly skilled Software Engineering Infrastructure Specialist to join our growing team...


  • San Francisco, California, United States Naptha AI Full time

    Company OverviewNaptha AI is a pre-seed company that aims to revolutionize AI agent infrastructure. Our team has deep expertise in AI and distributed systems, and we are looking for experienced technical leaders to help shape our technical strategy.SalaryWe offer a highly competitive salary, with the amount based on your experience and qualifications. The...


  • San Francisco, California, United States Magical Tome Full time

    About Magical TomeTome is a unified platform for enterprise sellers and account managers. Our mission is to simplify complex research and strategic planning for sellers by leveraging state-of-the-art models.We use our expertise in AI/ML to surface the most actionable knowledge about a customer from within internal systems as well as from public information...


  • San Mateo, California, United States Lumino Ai Full time

    An exciting opportunity awaits at Lumino, where you'll have the chance to shape the future of AI infrastructure. As a software engineer, you'll work on designing, building, and maintaining systems that enable AI model creation. With a focus on scalability and reliability, you'll drive innovation and growth. Our team is collaborative and cross-functional,...

  • AI Engineer

    2 weeks ago


    San Mateo, California, United States Lumino Ai Full time

    Lumino Ai is a technology company that builds infrastructure enabling anyone to create AI models. We're backed by prominent VCs like Longhash Ventures, OP Crypto, Protocol Labs, Quaker Capital, Escape Velocity, and OrangeDAO.About the RoleWe're seeking a highly skilled AI Engineer to join our team and help set the foundations of the company. As an AI...


  • San Francisco, California, United States Together AI Full time

    About the RoleWe are seeking an experienced Systems Research Engineer to join our team at Together AI. As a key member of our research-driven artificial intelligence company, you will play a crucial role in researching and building the next generation AI platform.Company OverviewTogether AI is committed to creating open and transparent AI systems that drive...


  • San Mateo, California, United States Lumino Ai Full time

    About UsLumino is a leading provider of AI infrastructure solutions. We're passionate about empowering humans to unlock the potential of AI. Our mission is to create a world where AI is accessible to everyone.We're looking for a talented Machine Learning Engineer to join our team. As a key member of our engineering team, you will be responsible for designing...


  • San Francisco, California, United States ZipRecruiter Full time

    Job DescriptionWe're looking for a highly skilled Ai Infrastructure Specialist to join our team of engineers and data scientists. As an AI Infrastructure Specialist, you'll play a key role in designing, building, and optimizing our AI infrastructure to support the needs of our organization.About the RoleDesign and Build Infrastructure: Design and build...