Principal AI/ML Infrastructure and Operations Engineer

3 weeks ago


San Francisco, California, United States UnitedHealth Group Full time

At UnitedHealth Group, we're committed to helping people live healthier lives and making the health system work better for everyone. As the Principal AI/ML Infrastructure and Operations Engineer, you'll play a critical role in ensuring the stability, reliability, scalability, and performance of our United AI Studio platform. This individual contributor role requires deep expertise in building and managing large-scale AI/ML platforms, providing strategic guidance, and hands-on technical leadership.

Key Responsibilities:

  • Design and implement scalable infrastructure solutions that align with our company's strategic goals and operational needs.
  • Oversee the management of multi-cloud (Azure, AWS, GCP) and hybrid infrastructure environments, ensuring secure and scalable solution hosting and optimal performance and cost-effectiveness.
  • Drive automation across the infrastructure lifecycle, leveraging Infrastructure as Code (IaC) and DevOps principles to streamline deployment and management processes.
  • Develop and implement monitoring frameworks for infrastructure, identifying areas for performance improvement, optimization, and ensuring high availability.
  • Collaborate with cybersecurity teams to ensure all systems and operations comply with industry standards and are secure against evolving threats.
  • Forecast and manage capacity requirements for the AI/ML infrastructure while identifying opportunities to reduce costs without compromising performance.
  • Stay updated with the latest in cloud technologies, AI/ML infrastructure advancements, and DevOps practices, providing leadership within the organization on best practices.

Requirements:

  • Bachelor's degree in computer science, information technology, or a related field.
  • 10+ years of infrastructure experience, with proven experience managing large-scale, cloud-based, enterprise-level software platforms and deep understanding of multi-cloud architectures.
  • 6+ years of practical experience in Infrastructure-as-Code and CI/CD tools like Terraform, Git Actions, and alike.
  • 5+ years of practical experience in containerization technologies (Kubernetes, Docker) and orchestration for large-scale workloads.

Preferred Qualifications:

  • Master's degree in computer science, information technology, or a related field.
  • Experience in monitoring and optimizing performance of distributed systems, particularly AI/ML pipelines and data processing workflows.
  • High-availability systems experience, with demonstrated success in building and maintaining highly available, fault-tolerant infrastructure.
  • Proven security & compliance knowledge, with solid understanding of security best practices and experience ensuring compliance with relevant regulatory frameworks.

About UnitedHealth Group:

UnitedHealth Group is a global organization that delivers care, aided by technology to help millions of people live healthier lives. We believe everyone – of every race, gender, sexuality, age, location, and income – deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups, and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes – an enterprise priority reflected in our mission.

Equal Employment Opportunity:

UnitedHealth Group is an Equal Employment Opportunity/Affirmative Action employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, age, national origin, protected veteran status, disability status, sexual orientation, gender identity or expression, marital status, genetic information, or any other characteristic protected by law.



  • San Francisco, California, United States Together AI Full time

    Job ResponsibilitiesInfrastructure Development:Identify and resolve infrastructure gaps to ensure reliable, efficient, and scalable AI/ML solutions.AI/ML Solutions:Develop advanced AI/ML infrastructure solutions to enhance the efficiency of our ML teams, leveraging expertise in distributed systems and large-scale data processing.System Design:Design and...


  • San Francisco, California, United States Together AI Full time

    AI Infrastructure Expertise:Design and implement high-performance AI/ML infrastructure, ensuring scalability, availability, and efficient resource utilization.Automation and Optimization:Develop and deploy automation tools, monitoring solutions, and operational strategies to streamline infrastructure management and reduce manual tasks.Collaboration and...


  • San Francisco, California, United States Acceler8 Talent Full time

    About the Role:We're seeking a highly skilled Principal AI Infrastructure Specialist to join our pioneering team at the forefront of AI and ML technology. As a key member of our team, you'll collaborate with researchers and product engineers to create innovative product experiences powered by large language models.Key Responsibilities:Design and implement...


  • San Francisco, California, United States Magical Tome Full time

    About TomeTome is a cutting-edge platform that empowers enterprise sellers and account managers to simplify complex research and strategic planning. Our state-of-the-art models leverage thousands of data sources to surface actionable knowledge about customers. A team of experienced sellers, engineers, and researchers tunes and customizes our system to meet...

  • ML Platform Engineer

    3 weeks ago


    San Francisco, California, United States Abridge Full time

    About the RoleAbridge is seeking a highly skilled ML Platform Engineer to join our team and help us scale our AI infrastructure. As a key member of our engineering team, you will be responsible for designing, implementing, and deploying machine learning models at scale.Our ideal candidate has a strong background in Python, Kubernetes, and cloud environments,...


  • San Francisco, California, United States Voxel Full time

    Revolutionize Workplace Safety with VoxelVoxel is a pioneering company that's changing the game when it comes to workplace safety and operations. We're passionate about using AI and computer vision technology to prevent workplace incidents and make the world a safer place.About the RoleWe're seeking a highly skilled Senior ML Infrastructure Engineer to join...


  • San Jose, California, United States Cisco Full time

    About the RoleCisco is seeking a highly skilled Principal AI/ML Engineer to join our Security AI team. As a key member of our team, you will play a critical role in designing, implementing, and evolving our AI platforms and products.Key ResponsibilitiesLead with technical and industry vision, driving our team's strategy in AI and Data Science.Define and...


  • San Francisco, California, United States Voxel Full time

    Voxel is a pioneering company revolutionizing workplace safety and operations with cutting-edge AI and computer vision technology.We're seeking a highly skilled Senior ML Infrastructure professional to join our team and design and implement systems to support Voxel's ML development.The ideal candidate will have a strong background in software engineering,...


  • San Francisco, California, United States Together AI Full time

    Job SummaryWe are seeking a highly skilled Senior AI Infrastructure Engineer to join our team at Together AI. As a key member of our infrastructure team, you will be responsible for designing, building, and maintaining our next-generation AI platform.Key ResponsibilitiesDesign and implement highly available AI infrastructure solutionsDevelop and maintain our...


  • San Francisco, California, United States Genmo Full time

    Job DescriptionWe are seeking a highly skilled Senior Staff AI Infrastructure Engineer to join our team at Genmo, a research lab dedicated to building open, state-of-the-art models for video generation. The ideal candidate will have a strong background in software engineering, with a focus on backend systems and ML infrastructure.Key Responsibilities:Design...


  • San Francisco, California, United States Deepscribe Full time

    About the RoleWe are seeking a Senior Software Engineer to join our ML Infrastructure team at DeepScribe. As a key member of our team, you will be responsible for building and optimizing infrastructure for audio processing, transcription, and LLM orchestration, ensuring scalability, reliability, and performance.You will collaborate with product and AI...


  • San Francisco, California, United States Genmo Full time

    Role OverviewWe are seeking a senior software engineer to join our inference team at Genmo, a research lab dedicated to building open, state-of-the-art models for video generation. The successful candidate will be responsible for designing and scaling our inference systems to support millions of users across multiple data centers.Key ResponsibilitiesDevelop...


  • San Francisco, California, United States Unity Technologies Full time

    About the RoleWe're seeking a skilled Senior Data and ML Infrastructure Engineer to join our team at Unity. As a key member of our Data & ML Platform team, you will design and optimize large-scale data platforms and machine learning infrastructure systems for efficiency, reliability, and cost-effectiveness.Key Responsibilities:Design and optimize large-scale...

  • AI Systems Engineer

    3 weeks ago


    San Francisco, California, United States Distyl AI Full time

    At Distyl AI, we're pushing the boundaries of AI innovation to power core operational workflows for the Fortune 500. We're seeking an experienced Frontend Engineer to join our team and help define the future of work with a focus on human value.You will build the UI/UX patterns in which AI is deployed and used by the world's most important institutions....


  • San Francisco, California, United States Together AI Full time

    About the RoleWe are seeking a highly skilled Senior AI Infrastructure Engineer to join our team at Together AI. As a key member of our infrastructure team, you will be responsible for designing and building the next generation of our AI platform, leveraging open-source technologies to enable and accelerate our growth.Key Responsibilities:Design and...


  • San Diego, California, United States Yoh Full time

    Job Title: Senior Cloud AI/ML EngineerJob Type: ContractIndustry: Government TechnologyLocation: RemoteJob Description:We are seeking a highly skilled Senior Cloud AI/ML Engineer to join our team at Yoh. As a Senior Cloud AI/ML Engineer, you will be responsible for designing, building, and monitoring Azure infrastructure for AI services. You will work...


  • San Francisco, California, United States Abridge Full time

    Job DescriptionAbridge is a pioneering healthcare technology company that's revolutionizing the way medical conversations are recorded and understood. As an ML Infrastructure Engineer, you'll play a critical role in scaling and deploying machine learning models to handle increasing traffic demands and integrate them with various platforms.Our team is...


  • San Francisco, California, United States Cobalt AI, LLC Full time

    Job Title: Senior Machine Learning Infrastructure EngineerAbout Us:At Cobalt AI, we're advancing physical security through innovative AI-powered monitoring solutions. Our primary offering, Cobalt Monitoring Intelligence, builds on our established track record of protecting Fortune 1000 companies across more than 10 countries. We've effectively managed over...

  • AI/ML Engineer

    3 weeks ago


    San Francisco, California, United States WEX Inc Full time

    About the RoleWe are seeking a highly motivated and results-oriented AI/ML Engineer to join our team at WEX Inc. As a key member of our AI Engineering team, you will be responsible for designing, testing, and deploying AI/ML platforms and tools to drive business growth and customer satisfaction.Key ResponsibilitiesCollaborate with stakeholders to understand...


  • San Francisco, California, United States Acceler8 Talent Full time

    About the RoleWe are seeking a Senior Software Engineer (AI Infrastructure / MLOps) to join our pioneering AI startup focused on enhancing data quality for machine learning. This role offers the chance to work on large-scale web applications and tackle complex challenges in a rapidly growing field.As a Senior Software Engineer (AI Infrastructure / MLOps),...