AI Infrastructure Engineer

4 weeks ago


San Francisco, United States Unreal Gigs Full time

Are you passionate about designing and building the robust infrastructure that powers cutting-edge AI solutions? Do you thrive on creating scalable, high-performance systems that support AI workloads, from training machine learning models to deploying real-time inference? If you're excited about building the backbone for the future of AI, then our client has the perfect opportunity for you. We’re looking for an AI Infrastructure Engineer (aka The AI Backbone Builder) to design, deploy, and maintain the infrastructure that powers AI innovation.

As an AI Infrastructure Engineer at our client, you’ll play a critical role in building the platforms that support machine learning and AI development across the organization. You’ll work closely with data scientists, software engineers, and DevOps teams to ensure that AI systems run efficiently, securely, and at scale. Your work will enable fast experimentation, seamless deployments, and the continuous delivery of AI models into production.

Key Responsibilities:

  1. Design and Build AI Infrastructure: Architect and implement scalable infrastructure that supports AI workloads, including machine learning model training, large-scale data processing, and real-time inference. You’ll design solutions that ensure high availability, fault tolerance, and performance optimization.
  2. Support AI Model Development and Deployment: Collaborate with data scientists and engineers to build pipelines that automate the end-to-end machine learning lifecycle, from data ingestion to model training, deployment, and monitoring. You’ll ensure smooth integration of AI models into production environments.
  3. Optimize AI Workloads for Performance: Implement strategies to optimize compute resources for AI workloads, including GPU/TPU provisioning, memory management, and parallel processing. You’ll ensure that infrastructure is optimized for the unique demands of AI and machine learning tasks.
  4. Cloud and On-Premise Infrastructure Management: Manage cloud-based AI platforms (AWS, GCP, Azure) as well as on-premise infrastructure for AI development. You’ll handle everything from infrastructure as code (IaC) to container orchestration (Docker, Kubernetes), ensuring seamless scalability and automation.
  5. Automation and Continuous Integration/Deployment (CI/CD): Implement and maintain CI/CD pipelines for machine learning models to enable rapid experimentation, testing, and deployment. You’ll automate workflows, model updates, and monitor the performance of AI systems in production.
  6. Security and Compliance: Ensure that the AI infrastructure complies with security best practices and regulatory requirements. You’ll implement robust access controls, encryption, and other security measures to protect sensitive data and AI models.
  7. Monitor and Troubleshoot AI Infrastructure: Continuously monitor the health and performance of AI infrastructure, identifying bottlenecks, reducing latency, and troubleshooting issues. You’ll ensure the reliability of systems, optimizing them as AI demands grow.

Required Skills:

  • AI Infrastructure Expertise: Deep experience in designing and building infrastructure that supports AI and machine learning workloads. You’re familiar with both cloud and on-premise infrastructure solutions and know how to optimize them for AI.
  • Cloud Platforms and Tools: Strong experience with cloud platforms like AWS, GCP, or Azure, particularly with AI services and infrastructure management. You’re comfortable with tools like SageMaker, AI Platform, or Azure ML, as well as container orchestration with Kubernetes.
  • Automation and DevOps: Expertise in automating infrastructure provisioning and model deployment using tools such as Terraform, Ansible, Jenkins, or GitLab CI. You’re skilled at managing CI/CD pipelines for AI model deployment.
  • GPU/TPU Optimization: Hands-on experience with GPU/TPU optimization for machine learning and deep learning tasks. You understand how to manage compute resources to maximize efficiency for AI workloads.
  • Security and Compliance: Strong understanding of security best practices, including data encryption, access management, and compliance with regulations like GDPR and HIPAA.

Educational Requirements:

  • Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field. Equivalent experience in AI infrastructure or DevOps is highly valued.
  • Certifications in cloud platforms (AWS, GCP, Azure) or DevOps tools are a plus.

Experience Requirements:

  • 3+ years of experience in infrastructure engineering, with a focus on building and maintaining AI or machine learning infrastructure in production environments.
  • Proven experience with cloud services, containerization, orchestration tools, and optimizing infrastructure for AI workloads.
  • Experience working with data scientists and machine learning engineers to support model development, testing, and deployment.
#J-18808-Ljbffr

  • San Francisco, California, United States Naptha AI Full time

    About Naptha AIWe are seeking exceptional Software Engineering interns to join Naptha AI and contribute to building the future of AI agent infrastructure.This internship offers hands-on experience working with frontier AI technology, backed by industry veterans and technical leaders through NVIDIA Inception, Google for Startups, and Microsoft for Startups.As...


  • San Francisco, California, United States Scale AI Full time

    Cloud AI Engineer Position at ScaleWe are seeking an experienced Cloud AI Engineer to join our team at Scale, a leading provider of AI solutions. As a Cloud AI Engineer, you will play a key role in designing and developing our cloud infrastructure platforms and systems.The ideal candidate will have extensive experience in software development and a deep...


  • San Francisco, California, United States Together AI Full time

    Are you a skilled DevOps engineer looking to take your career to the next level? Do you have a passion for designing and building automated infrastructure pipelines? We are seeking a talented Senior DevOps Engineer to join our cloud engineering team at Together AI. About the RoleWe are hiring a highly experienced Senior DevOps Engineer to lead the...


  • San Francisco, California, United States Magic AI Full time

    Company OverviewMagic AI is a cutting-edge technology company dedicated to building safe Artificial General Intelligence (AGI) that accelerates humanity's progress on the world's most important problems.We believe the most promising path to safe AGI lies in automating research and code generation to improve models and solve alignment more reliably than...


  • San Francisco, California, United States Unum AI Full time

    At Unum AI, we're revolutionizing data infrastructure with our cutting-edge technology. We're seeking a highly skilled Ai Infrastructure Engineer to join our team in designing and implementing next-generation database management systems.About the RoleThis is an exciting opportunity for a passionate engineer to orchestrate software development and hardware...


  • San Francisco, California, United States Abridge AI Inc. Full time

    Abridge AI Inc. is a pioneering force in healthcare technology, utilizing artificial intelligence to empower deeper understanding and improve clinical documentation efficiency.Role OverviewWe are seeking an exceptional ML Systems Engineer to join our team, responsible for scaling and deploying machine learning models to handle increasing traffic demands and...


  • San Francisco, California, United States Together AI Full time

    About the RoleWe are seeking an experienced Systems Research Engineer to join our team at Together AI. As a key member of our research-driven artificial intelligence company, you will play a crucial role in researching and building the next generation AI platform.Company OverviewTogether AI is committed to creating open and transparent AI systems that drive...


  • San Francisco, United States ZipRecruiter Full time

    Job DescriptionAre you passionate about designing and building the robust infrastructure that powers cutting-edge AI solutions? Do you thrive on creating scalable, high-performance systems that support AI workloads, from training machine learning models to deploying real-time inference? If you're excited about building the backbone for the future of AI, then...


  • San Francisco, United States Unreal Gigs Full time

    Are you passionate about designing and building the robust infrastructure that powers cutting-edge AI solutions? Do you thrive on creating scalable, high-performance systems that support AI workloads, from training machine learning models to deploying real-time inference? If you're excited about building the backbone for the future of AI, then our client has...


  • San Francisco, California, United States Abridge AI Inc. Full time

    Abridge AI Inc. is a trailblazing organization that empowers deeper understanding in healthcare through innovative AI solutions. Our mission-driven approach has led to the development of industry-leading natural language understanding products.Job OverviewWe are seeking a highly skilled Software Engineering Infrastructure Specialist to join our growing team...

  • Software Engineer

    1 week ago


    San Francisco, California, United States Stack AI Full time

    About Stack AIWe're a fast-growing startup on a mission to democratize access to Large Language Models. Our user-friendly and intuitive No-Code platform integrates the best AI models, common data sources, and SaaS tools.Our Traction is impressive: launched 8 months ago with over 65,000 users and 300+ paying customers, including public companies and...


  • San Francisco, California, United States Perplexity AI Full time

    AI-Driven Search Solutions: Technical Lead PositionWe're looking for an experienced Senior DevOps Engineer to join our team at Perplexity AI. As a key member of our infrastructure team, you'll play a crucial role in shaping the technical direction and implementing scalable solutions for our rapidly growing search platform.Technical RequirementsYou will be...


  • San Francisco, CA, United States Unreal Gigs Full time

    Are you passionate about designing and building the robust infrastructure that powers cutting-edge AI solutions? Do you thrive on creating scalable, high-performance systems that support AI workloads, from training machine learning models to deploying real-time inference? If you're excited about building the backbone for the future of AI, then our client ...

  • AI Engineering Lead

    1 week ago


    San Francisco, California, United States Avala AI Full time

    Unlock Your Potential as an AI Engineering LeadAvala AI is a cutting-edge technology company that empowers communities through dignified digital work. We believe in connecting people to equitable wages, ensuring the highest quality of service for our customers and the highest quality of life for our team.We are seeking an experienced Full Stack Engineer who...


  • San Francisco, California, United States ZipRecruiter Full time

    Exciting Opportunity at ZipRecruiterAbout the RoleWe are seeking an exceptional AI Infrastructure Engineering Director to join our team. As a key member of our infrastructure engineering group, you will be responsible for leading the design, development, and optimization of our machine learning infrastructure solutions.The ideal candidate will have a strong...


  • San Francisco, California, United States Unreal Gigs Full time

    Design and Build AI InfrastructureArchitect and implement scalable infrastructure that supports AI workloads, including machine learning model training, large-scale data processing, and real-time inference.As an AI Infrastructure Engineer, you'll design solutions that ensure high availability, fault tolerance, and performance optimization.


  • San Francisco, California, United States Unreal Gigs Full time

    Job OverviewWe are seeking an experienced Cloud and Machine Learning Architect to lead our AI infrastructure engineering initiatives. As a key member of our team, you will design, develop, and optimize scalable and reliable infrastructure solutions to support machine learning workflows.


  • San Francisco, California, United States Abridge AI Full time

    Unlock the Future of Healthcare with Abridge AIAbridge AI is a pioneering organization dedicated to revolutionizing medical conversations through AI. We are seeking an experienced Full Stack Software Engineer to join our team and help us build innovative solutions for healthcare professionals.About the RoleThis position offers a unique opportunity to design,...


  • San Francisco, California, United States Figma Full time

    Figma is a platform that makes design accessible to all. Born on the Web, Figma helps entire product teams brainstorm, design and build better products - from start to finish.Job DescriptionWe're looking for an AI engineer who has experience building data pipelines to collect high-quality data, and evaluation systems to evaluate AI models. You will be...


  • San Francisco, California, United States Crusoe Energy Inc Full time

    About Crusoe Energy Inc.Crusoe Energy is a pioneering company on a mission to harness the value of stranded energy resources through cutting-edge computation. We aim to create a harmonious balance between the long-term interests of the climate and the future of global computing infrastructure.As data centers continue to consume an exponentially growing power...