AI Operations Engineer

6 days ago


Bellevue, United States Highbrow LLC Full time

Job Description:

· 5+ years of experience in DevOps engineering, with at least 3 years specializing in AI Ops or supporting ML/AI model deployment and infrastructure.

· Proven experience in designing, implementing, and managing CI/CD pipelines and ML Ops frameworks to automate AI/ML workflows.


Technical Skills:


· Proficiency in cloud platforms (AWS, GCP, Azure) with hands-on experience in deploying AI/ML models and utilizing AI/ML services (e.g., AWS SageMaker, Google AI Platform).

· Strong skills in containerization and orchestration tools such as Docker and Kubernetes, especially for deploying machine learning models at scale.

· Experience with infrastructure-as-code tools like Terraform, CloudFormation, or Ansible to manage and provision cloud and on-premise environments.

· Proficiency in CI/CD tools (e.g., Jenkins, GitLab CI, CircleCI) to build automated pipelines for AI/ML model training, testing, and deployment.

· Solid understanding of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) for model performance tracking and infrastructure observability.

· Strong programming and scripting skills in Python, Bash, and YAML for automating workflows and integrating services.


AI Ops and MLOps Skills:


· Experience with MLOps best practices, including model versioning, automated retraining, and model governance for reliable and reproducible AI pipelines.

· Hands-on experience with model monitoring tools (e.g., MLflow, Kubeflow, or TFX) to track model performance, drift, and retraining needs.

· Familiarity with data pipelines and orchestration tools (e.g., Apache Airflow, Prefect) for managing data and model workflows.

· Knowledge of model deployment strategies (e.g., blue-green deployments, canary releases) to ensure reliable AI/ML model deployment with minimal downtime.

· Experience with A/B testing and experiment tracking to evaluate model performance in production and measure the impact on business KPIs.


DevOps and Automation Skills:


· Ability to design and manage scalable infrastructure to support machine learning workloads, ensuring cost efficiency, performance, and security.

· Proficiency in automating testing and deployment processes for data and model pipelines to support fast, reliable releases.

· Familiarity with serverless architectures and cloud-native tools for AI, allowing for flexible and efficient resource management.

· Experience with security best practices, including role-based access control, data encryption, and compliance requirements for data-sensitive applications.


Communication and Collaboration Skills:

· Excellent communication skills with the ability to collaborate closely with data scientists, ML engineers, and software development teams.

· Proven ability to document infrastructure, CI/CD pipelines, and MLOps processes, ensuring transparency and knowledge sharing across teams.

· Strong problem-solving skills and a proactive approach to troubleshooting, particularly in managing and resolving deployment and performance issues.

· Ability to train and mentor team members on MLOps tools, best practices, and model deployment techniques.


Additional Qualifications:

· Experience with data security and governance standards, especially related to machine learning applications in regulated industries.

· Familiarity with AI ethics and compliance, including model fairness, transparency, and risk management.

· Knowledge of advanced monitoring and alerting tools and techniques to ensure the reliability of AI systems in production.

· Strong interest in staying up-to-date on the latest advancements in MLOps and AI Ops to continuously improve infrastructure and processes.


  • AI/ML Scientist

    2 months ago


    Bellevue, United States Stealth AI Startup * Full time

    Company Overview:We are a venture-backed stealth startup reimagining the shopping experience with the power of Generative AI for brands and retailers. We are a team of serial entrepreneurs, engineers, and research scientists from Amazon, Microsoft, and innovative commerce enablement startups. Our team has extensive experience working directly with Fortune...

  • AI/ML Scientist

    3 weeks ago


    Bellevue, WA, United States Stealth AI Startup * Full time

    Company Overview:We are a venture-backed stealth startup reimagining the shopping experience with the power of Generative AI for brands and retailers. We are a team of serial entrepreneurs, engineers, and research scientists from Amazon, Microsoft, and innovative commerce enablement startups. Our team has extensive experience working directly with Fortune...


  • Bellevue, United States Meta Inc Full time

    Summary: Meta is seeking a Software Engineer to join our Meta AI foundation team. We are responsible for building foundations for developing and optimizing products based on Large Language Models (LLMs). We are looking for strong engineers who have a background in generative AI and NLP/NLU, with experience in areas like language model evaluation; LLM...


  • Bellevue, Washington, United States META Full time

    About the Role:Meta is seeking an AI Software Engineer to join our Research & Development teams. The ideal candidate will have industry experience working on AI Infrastructure related topics.The position will involve taking these skills and applying them to solve for some of the most crucial & exciting problems that exist on the web.We are hiring in multiple...


  • Bellevue, United States META Full time

    Summary: The MTIA (Meta Training & Inference Accelerator) Software team has been developing a comprehensive AI Compiler strategy and optimizing compiler toolchains. This enables training and inference of Meta’s production DL/ML workloads on the specialized MTIA AI accelerator hardware in a highly performant and flexible way.We are looking for a Software...


  • Bellevue, Washington, United States META Full time

    Summary:META is seeking a talented AI Software Engineer to join our Research & Development teams. The ideal candidate will have industry experience working on AI Infrastructure related topics. This position will involve applying relevant AI infrastructure and hardware acceleration techniques to build and optimize intelligent ML systems that improve META's...

  • Frontend Engineer

    3 weeks ago


    Bellevue, United States Stott and May Full time

    Frontend Engineer | AI Start Up This early stage business is seeking a Frontend Engineer to join their west-coast team at Bellevue location. You will have considerable authority and responsibility in making critical decisions that mold the company's future. The role entails leading projects, collaborating with designers, and working closely with the CEO &...


  • Bellevue, United States Advanced Mircro Devices, Inc. Full time

    WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our...

  • Software Engineer

    4 weeks ago


    Bellevue, WA, United States Advanced Mircro Devices, Inc. Full time

    We are a powerhouse in the ever-evolving landscape of artificial intelligence, specializing in AI optimization and fine-tuning large language models to unlock unprecedented Generative AI efficiency.Our expertise extends beyond the hardware realm, encompassing 3P enablement, where we develop custom AI Software Solutions for Industry leading AI customers.As...


  • Bellevue, United States T-Mobile Full time

    At T-Mobile, we invest in YOU! Our Total Rewards Package ensures that employees get the same big love we give our customers. All team members receive a competitive base salary and compensation package - this is Total Rewards. Employees enjoy multiple wealth-building opportunities through our annual stock grant, employee stock purchase plan, 401(k), and...


  • Bellevue, Washington, United States Walmart Full time

    About the Role:We are seeking a highly skilled Principal Software Engineer to lead the architecture, design, and development of a Generative AI Image Platform. This platform will be a multi-tenant, cloud-native SaaS solution, designed to scale seamlessly and handle millions of requests while enabling state-of-the-art personalization, recommendation, and...


  • Bellevue, United States Advanced Micro Devices , Inc. Full time

    Overview: WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences the building blocks for the data center, artificial intelligence, PCs, gaming and embedded....


  • Bellevue, United States Amazon.com Services LLC Full time

    Conversational AI ModEling and Learning (CAMEL) team is part of Amazon Artificial General Intelligence (AGI) organization where our mission is to create a best-in-class Conversational AI that is intuitive, intelligent, and responsive, by developing superior Large Language Models (LLM) solutions and services which increase the capabilities built into the...


  • Bellevue, Washington, United States Amazon Services LLC Full time

    Amazon Services LLC is a dynamic organization that drives innovation in Artificial General Intelligence (AGI) through its Conversational AI Modeling and Learning (CAMEL) team.Join Our MissionWe're seeking a seasoned leader to spearhead our Large Language Model (LLM) initiatives, fostering cutting-edge conversational AI solutions for millions of customers...

  • Sr. AI ML Engineer

    3 weeks ago


    Bellevue, United States Futran Tech Solutions Pvt. Ltd. Full time

    Role :Sr. AI ML Engineer /Data ScientistLocation : Bellevue ,Washington (Day 1 onsite)Job Description : We are searching for a strategic and inquisitive senior data scientist/senior AI-ML Engineer to develop and run with data-centered projects in telecom sector.KEY RESPONSIBILITIES: Formulating, suggesting, and managing data-driven projects which are geared...


  • Bellevue, Washington, United States Advanced Micro Devices , Inc. Full time

    OverviewWe are Advanced Micro Devices, Inc. (AMD), a company transforming lives with our cutting-edge technology to enrich our industry, communities, and the world.At AMD, we push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse...


  • Bellevue, United States Advanced Micro Devices , Inc. Full time

    AMD is looking for a senior engineering manager to help accelerate our customers by leading the effort to deliver a unified inference engine platform that can be leveraged as part of first and third-party inference engines and can also be consumed in Manager, Development, AI, Software, Technology

  • AI Research Scientist

    4 weeks ago


    Bellevue, Washington, United States META Full time

    Meta is seeking a highly skilled AI Research Scientist to join our team.We are looking for a talented individual with a strong background in language-related topics and experience in applying AI and machine learning techniques to build intelligent language systems.The ideal candidate will have a proven track record of taking new research findings and...


  • Bellevue, United States AI Data Innovations Full time

    Role Summary: The Data Collection Specialist role is integral to AI research and development initiatives. Key responsibilities include managing participants through data collection processes, following processes and procedures, maintaining data and project confidentiality, participant recruitment and marketing, data collection setup, thorough quality...

  • MLops Engineer

    3 weeks ago


    Bellevue, United States Highbrow LLC Full time

    Job Description:We are seeking a highly skilled ML/Ops or DevOps Engineer to join our dynamic team in the telecom industry. The ideal candidate will be responsible for designing, building, and maintaining scalable, high-performance machine learning (ML) and operational infrastructure, focusing on automation, efficiency, and optimization. You will play a key...