AI Infrastructure Systems Developer

3 weeks ago


San Francisco, California, United States ZipRecruiter Full time

Job Overview

ZipRecruiter is seeking an experienced American I.A. Infrastructure Systems Developer to join our team in the United States, with a salary range of $170,000 - $200,000 per year.

The successful candidate will be responsible for designing and building infrastructure that supports cutting-edge AI solutions, working closely with data scientists and software engineers to ensure seamless deployments and continuous delivery of AI models into production environments.

About the Role

We're looking for a skilled professional with 3+ years of experience in infrastructure engineering, particularly in building and maintaining AI or machine learning infrastructure in production environments. The ideal candidate will have a strong background in cloud services, containerization, and orchestration tools, as well as expertise in optimizing infrastructure for AI workloads.

Key Responsibilities

  • Design and Build AI Infrastructure: Architect and implement scalable infrastructure that supports AI workloads, including machine learning model training, large-scale data processing, and real-time inference.
  • Support AI Model Development and Deployment: Collaborate with data scientists and engineers to build pipelines that automate the end-to-end machine learning lifecycle, from data ingestion to model training, deployment, and monitoring.
  • Optimize AI Workloads for Performance: Implement strategies to optimize compute resources for AI workloads, including GPU/TPU provisioning, memory management, and parallel processing.
  • Cloud and On-Premise Infrastructure Management: Manage cloud-based AI platforms (AWS, GCP, Azure) as well as on-premise infrastructure for AI development, handling everything from infrastructure as code (IaC) to container orchestration (Docker, Kubernetes).
  • Automation and Continuous Integration/Deployment (CI/CD): Implement and maintain CI/CD pipelines for machine learning models to enable rapid experimentation, testing, and deployment.
  • Security and Compliance: Ensure that the AI infrastructure complies with security best practices and regulatory requirements, implementing robust access controls, encryption, and other security measures to protect sensitive data and AI models.

Requirements

  • Required Skills and Qualifications
    • AI Infrastructure Expertise: Deep experience in designing and building infrastructure that supports AI and machine learning workloads.
    • Cloud Platforms and Tools: Strong experience with cloud platforms like AWS, GCP, or Azure, particularly with AI services and infrastructure management.
    • Automation and DevOps: Expertise in automating infrastructure provisioning and model deployment using tools such as Terraform, Ansible, Jenkins, or GitLab CI.
    • GPU/TPU Optimization: Hands-on experience with GPU/TPU optimization for machine learning and deep learning tasks.
    • Security and Compliance: Strong understanding of security best practices, including data encryption, access management, and compliance with regulations like GDPR and HIPAA.
    • Educational Requirements
      • Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field.
      • Certifications in cloud platforms (AWS, GCP, Azure) or DevOps tools are a plus.
    • Experience Requirements
      • 3+ years of experience in infrastructure engineering, with a focus on building and maintaining AI or machine learning infrastructure in production environments.
      • Proven experience with cloud services, containerization, orchestration tools, and optimizing infrastructure for AI workloads.
      • Experience working with data scientists and machine learning engineers to support model development, testing, and deployment.

Benefits

  • Comprehensive Benefits Package, including health insurance, paid time off, and wellness programs.
  • Professional Development Opportunities, including training, certification reimbursement, and career advancement programs.

  • Infrastructure Lead

    3 weeks ago


    San Francisco, California, United States Naptha AI Full time

    Naptha AI is looking for a talented Cloud-Scale Distributed Systems Engineer to lead the development of our AI infrastructure. You will be responsible for designing and implementing scalable infrastructure for massive agent networks, architecting systems for efficient agent communication and coordination, and building robust, distributed systems for agent...


  • San Francisco, California, United States Naptha AI Full time

    About Naptha AIWe are seeking exceptional Software Engineering interns to join Naptha AI and contribute to building the future of AI agent infrastructure.This internship offers hands-on experience working with frontier AI technology, backed by industry veterans and technical leaders through NVIDIA Inception, Google for Startups, and Microsoft for Startups.As...


  • San Francisco, California, United States Naptha AI Full time

    Company OverviewNaptha AI is a pre-seed company that aims to revolutionize AI agent infrastructure. Our team has deep expertise in AI and distributed systems, and we are looking for experienced technical leaders to help shape our technical strategy.SalaryWe offer a highly competitive salary, with the amount based on your experience and qualifications. The...


  • San Mateo, California, United States Lumino Ai Full time

    An exciting opportunity awaits at Lumino, where you'll have the chance to shape the future of AI infrastructure. As a software engineer, you'll work on designing, building, and maintaining systems that enable AI model creation. With a focus on scalability and reliability, you'll drive innovation and growth. Our team is collaborative and cross-functional,...


  • San Francisco, California, United States Naptha AI Full time

    Your Expertise MattersWe invite applications from individuals with diverse backgrounds and experiences who believe they can add value to our mission of building the infrastructure for the next generation of AI systems. As an Advisor, you will have the opportunity to shape the future of AI while working with a team backed by industry veterans and technical...


  • San Francisco, California, United States Naptha AI Full time

    About Naptha AIWe are seeking a skilled professional to shape the future of AI agent development and build relationships with frontier AI developers. This is a rare opportunity to influence the direction of AI infrastructure at a massive scale, backed by industry veterans and technical leaders.


  • San Francisco, California, United States Together AI Full time

    About the Role">We are seeking a highly skilled DevOps Engineer to join our team at Together AI. As an MLOps engineer, you will develop systems and APIs that enable our customers to perform inference and fine-tune LLMs.">Key Responsibilities">Implement runtime systems that perform inference at scale using AI/ML models from simple models up to the largest...


  • San Francisco, California, United States Together AI Full time

    Company Overview:At Together AI, we believe open and transparent AI systems will drive innovation and create the best outcomes for society. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama.Job Description:We are seeking an experienced MLOps engineer to develop systems and APIs that enable our customers...


  • San Francisco, California, United States Naptha AI Full time

    Job OverviewWe are seeking a highly skilled AI Infrastructure Strategist to help shape the future of AI agent infrastructure at Naptha AI. As an advisor, you will leverage your expertise to guide our journey in building the foundational infrastructure for the next wave of AI companies.


  • San Francisco, California, United States Magic AI Full time

    Magic AI is a pioneering company building safe Artificial General Intelligence (AGI) to accelerate humanity's progress on the world's most pressing challenges. Our mission is to develop AGI that complements human capabilities, rather than replacing them.The Supercomputing Platform & Infrastructure team at Magic AI is responsible for designing and...


  • San Francisco, California, United States ZipRecruiter Full time

    Job Title:AI Infrastructure Systems ArchitectAbout the Role:We are seeking an experienced AI Infrastructure Systems Architect to design and build scalable infrastructure that supports AI workloads. The ideal candidate will have a deep understanding of cloud and on-premise infrastructure solutions and be able to optimize them for AI.Key...


  • San Francisco, California, United States Hamming AI Full time

    Backend and Infrastructure EngineerWe are a fast-growing voice AI testing company that has seen significant revenue growth.As a Backend and Infrastructure Engineer, you will play a crucial role in scaling our current products and infrastructure to support 100x growth.Optimize and productize processes that humans currently handle, ensuring seamless...


  • San Francisco, California, United States Hinge-Health Full time

    About UsHinge Health is a pioneering digital health company that provides cutting-edge, evidence-based solutions for musculoskeletal (MSK) pain management. Our innovative approach combines personalized exercise therapy and virtual care to help individuals manage chronic pain, improving their quality of life while reducing healthcare costs.The AI Platform...


  • San Francisco, California, United States Naptha AI Full time

    Naptha AI is seeking an exceptional AI Agent Developer Evangelist to shape the future of AI agent development and nurture relationships with frontier AI developers. This role involves building and deploying next-generation AI agents, creating technical content, and shaping the broader agent development ecosystem.Job DescriptionAbout this role:We are looking...


  • San Francisco, California, United States Cambio AI Inc. Full time

    About Cambio AI Inc.We are a cutting-edge platform that enables the creation and deployment of AI workers to automate communication. Our innovative solution connects to any system or data source, handling phone calls, email, and messages with ease.Our primary focus is on the logistics industry, which relies heavily on communication for tasks such as booking,...


  • San Francisco, California, United States Unum AI Full time

    At Unum AI, we're pushing the boundaries of data infrastructure to empower applications that require extreme scale and artificial intelligence.We're on a mission to design next-generation data systems that unlock unprecedented capabilities in data-intensive and AI-driven applications.Key Responsibilities:Optimizing and implementing existing algorithms to...


  • San Francisco, California, United States Waveforms Full time

    About UsWaveForms AI is a leading Audio Large Language Models (LLMs) company, revolutionizing human-AI interactions through advanced research and innovative products.Job OverviewWe are seeking an experienced Azure Cloud Architect to lead the design and implementation of our large-scale training and real-time inference pipelines. The ideal candidate will have...


  • San Francisco, California, United States Perplexity AI Full time

    About the PositionWe are seeking a skilled engineer to join our small team and contribute to the development of AI-powered search interfaces. This role involves building complex orchestration systems, collaborating with frontend engineers, and working closely with machine learning teams.


  • San Francisco, California, United States Naptha AI Full time

    Job DescriptionNaptha AI is seeking an exceptional Research Scientist intern to contribute to advancing the state of AI agent systems. This internship offers hands-on research experience at the frontier of AI technology, backed by industry veterans and technical leaders.We're building the foundational infrastructure for the next wave of AI companies,...


  • San Francisco, California, United States Magic AI Full time

    Company OverviewMagic AI is a cutting-edge technology company dedicated to building safe Artificial General Intelligence (AGI) that accelerates humanity's progress on the world's most important problems.We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than...