Current jobs related to Site Reliability Engineer for AI Platform - San Jose, California - Adobe


  • San Jose, California, United States Adobe Full time

    Job Title: Site Reliability Engineering Manager, AI PlatformAbout the Role:We are seeking an experienced Site Reliability Engineering Manager to lead our AI Inference Platform team at Adobe. As a key member of our Engineering organization, you will be responsible for developing and implementing strategies to ensure the reliability, scalability, and security...


  • San Jose, California, United States Adobe Full time

    Job Title: Site Reliability Engineer, AI Platform TrainingJob Summary: We are seeking a highly skilled Site Reliability Engineer to join our team at Adobe. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and security of our AI Platform.About the Role:* Identify and implement methodologies and solutions to...


  • San Jose, California, United States HireIO Inc Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at HireIO Inc. As a Site Reliability Engineer, you will be responsible for designing and developing solutions to automate the technical operations of large-scale systems, working closely with teams to improve stability from a Software Development Lifecycle...


  • San Jose, California, United States Coactive AI Full time

    At Coactive AI, we're revolutionizing the way businesses interact with visual content. As a Senior Software Engineer on our AI Applications team, you'll play a pivotal role in bridging the gap between customer success, product development, and engineering to deliver impactful AI-driven solutions.Leveraging our advanced Multimodal AI Platform (MAP), you'll...

  • Software Engineer

    4 weeks ago


    San Jose, California, United States Coactive AI Full time

    Unlock the power of visual data with Coactive AI.As a Software Engineer on our AI Applications team, you will play a pivotal role in developing and maintaining RESTful microservices using Python and FastAPI.Leveraging our advanced Multimodal AI Platform (MAP), you'll bridge the gap between customer success, product development, and engineering to deliver...


  • San Francisco, California, United States TBWA\Chiat\Day Full time

    Job Title:Senior Site Reliability Engineer with Perplexity AIJob Summary:We are seeking a highly skilled Senior Site Reliability Engineer to join our team at Perplexity AI. As a key member of our infrastructure team, you will be responsible for designing, implementing, and scaling our cloud infrastructure to support our AI-powered search...

  • Software Engineer

    4 weeks ago


    San Jose, California, United States Coactive AI Full time

    Coactive is revolutionizing the way businesses harness the power of machine learning to unlock the potential of unstructured data. We are seeking a highly skilled Software Engineer to join our Solutions team as an AI Solutions Expert.About the Role:As an AI Solutions Expert, you will be responsible for delivering AI-focused technical solutions with clear...

  • AI Platform Engineer

    4 weeks ago


    San Francisco, California, United States Labelbox Full time

    About the RoleLabelbox is seeking a skilled AI Platform Engineer to join our team. As a key member of our engineering organization, you will be responsible for building and maintaining a scalable AI platform that utilizes foundation models for real-world applications.Your Day to DayEnhance and improve Labelbox's core machine learning capabilities, including...


  • San Francisco, California, United States Genmo Full time

    Job DescriptionWe are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI.As a Site Reliability Engineer at Genmo, you will be responsible for designing, implementing, and maintaining the infrastructure that powers our large generative AI models. You will work on...


  • San Francisco, California, United States Zilliz Full time

    Job Title: Cloud Platform Staff Site Reliability EngineerWe are seeking a highly skilled Cloud Platform Staff Site Reliability Engineer to join our team at Zilliz. As a key member of our SRE team, you will be responsible for ensuring the reliability, availability, and performance of our distributed database systems.Key Responsibilities:Design and build tools...


  • San Jose, California, United States Tik Tok Full time

    About Team Site Reliability Engineering at TikTokTikTok's mission is to inspire creativity and bring joy. Our platform is built to help imaginations thrive, and our Site Reliability Engineering team plays a crucial role in making this happen.ResponsibilitiesDesign and implement software platforms and monitor frameworks for efficient, automated, and...


  • San Jose, California, United States Hume AI Full time

    About the RoleWe are seeking an AI Engineer and Writer to help us advance our mission of building empathic AI. As part of our team, you will create content that helps developers understand the role of emotional intelligence in AI and integrate our API into wide-ranging applications.ResponsibilitiesCopyedit developer materials, including API documentation and...


  • San Jose, California, United States PayPal Full time

    At PayPal, we're revolutionizing commerce globally, and we need a Senior AI/ML Platform Manager to help us scale our AI/ML infrastructure and platform.We're looking for a strong Senior Product Manager with a deep understanding of the AI/ML Platform stack and a strong business acumen to partner with Data Scientists and ML Engineers in delivering a...


  • San Jose, California, United States Adobe Full time

    Job SummaryWe are seeking a highly skilled Senior AI Engineer to join our team at Adobe. As a key member of our platform, you will be responsible for designing, developing, and maintaining robust AI/ML infrastructure solutions to support the training and deployment of large-scale AI models. Key ResponsibilitiesDesign and develop AI/ML infrastructure...


  • San Francisco, California, United States Together AI Full time

    Job ResponsibilitiesInfrastructure Development:Identify and resolve infrastructure gaps to ensure reliable, efficient, and scalable AI/ML solutions.AI/ML Solutions:Develop advanced AI/ML infrastructure solutions to enhance the efficiency of our ML teams, leveraging expertise in distributed systems and large-scale data processing.System Design:Design and...


  • San Leandro, California, United States Omni Inclusive Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Omni Inclusive. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and availability of our Digital Sales & Marketing platforms.Key Responsibilities:Design, implement, and maintain scalable and efficient systems to...


  • San Jose, California, United States Adobe Full time

    Job Title: Senior Product Manager, AI PlatformAbout the Role:We are seeking a seasoned AI/ML product management leader to lead the platform providing responsible data and enabling training for our models. The ideal candidate is a seasoned AI/ML product management leader with experience empowering applied AI/ML researchers to deliver best-in-class...


  • San Francisco, California, United States Hinge Health Full time

    About the RoleHinge Health is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our platform, including automation, logging, monitoring, and alerting.You will thrive in a collaborative environment, have excellent communication skills, and be...


  • San Jose, California, United States Tik Tok Full time

    Job SummaryTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. As a Site Reliability Engineer on our Compute Platform team, you will play a critical role in ensuring the reliability of all Big Data services and products across the company.Key Responsibilities Responsible for the reliability of...


  • San Jose, California, United States PayPal, Inc. Full time

    Job Title: Senior AI/ML Platform ManagerJob Summary:PayPal, Inc. is seeking a Senior AI/ML Platform Manager to lead the development and implementation of our AI/ML platform. The successful candidate will have a strong background in AI/ML and experience in managing cross-functional teams.Key Responsibilities:* Develop and execute a long-term strategy for the...

Site Reliability Engineer for AI Platform

1 month ago


San Jose, California, United States Adobe Full time

About the Role

We're seeking a highly skilled Site Reliability Engineer to join our team at Adobe, working on the AI Training Platform. As a key member of our team, you'll be responsible for ensuring the highest uptime and Quality of Service (QoS) for our customers.

Key Responsibilities

  • Design and implement methodologies to increase reliability, scalability, security, and efficiency.
  • Collaborate with cross-functional teams to define service level objectives (SLOs) and indicators (SLIs) to represent and measure service quality.
  • Develop and maintain globally distributed, multi-cloud environments to support our AI platform.
  • Automate common, repeatable tasks at a large scale to streamline operational procedures.
  • Identify areas to improve service resiliency through techniques such as chaos engineering and performance/load testing.

Requirements

  • Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field, and 5+ years of relevant industry experience.
  • Experience in building and scaling distributed systems, as well as experience with containerization and orchestration technologies like Kubernetes.
  • Production-level expertise with containerization orchestration engines and proven understanding of modern, continuous development techniques and pipelines.
  • Fundamental programming skills, ideally practical experience in one (and preferably more) of the following languages: Python, Go.
  • Good knowledge of infrastructure configuration management tools like Ansible and Terraform.
  • Experience in using observability and tracing-related tools like InfluxDB, Prometheus, and Elastic Stack.
  • An understanding of AI/ML, including ML frameworks, public cloud, and commercial AI/ML solutions.

About Adobe

At Adobe, we're passionate about empowering people to create beautiful and powerful images, videos, and apps, and transform how companies interact with customers across every screen. We're committed to creating exceptional employee experiences where everyone is respected and has access to equal opportunity.

Compensation and Benefits

Our compensation reflects the cost of labor across several U.S. geographic markets, and we pay differently based on those defined markets. The U.S. pay range for this position is $124,000 -- $234,200 annually. Pay within this range varies by work location and may also depend on job-related knowledge, skills, and experience.

At Adobe, we're proud to be an Equal Employment Opportunity and affirmative action employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other applicable characteristics protected by law.