Site Reliability Engineer, AI Platform Training

2 weeks ago


Seattle, Washington, United States Adobe Systems Full time
Job Summary

We are seeking a highly skilled Site Reliability Engineer to join our team at Adobe Firefly, our AI Training Platform. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and security of our platform.

Key Responsibilities
  • Identify and implement methodologies and solutions to increase reliability, scalability, security, and efficiency.
  • Ensure the highest uptime and Quality of Service (QoS) for Adobe's customers through operational excellence.
  • Define service level objectives (SLOs) and indicators (SLIs) to represent and measure service quality.
  • Support and maintain globally distributed, multi-cloud (public and/or private) environments.
  • Automate common, repeatable tasks at a large scale to streamline operational procedures.
  • Identify areas to improve service resiliency through techniques such as chaos engineering, performance/load testing, etc.
  • Coordinate with other Adobe platform teams and service providers (primarily AWS) to innovate on Generative AI as a Service.
Requirements
  • A Bachelor's or Master's degree in Computer Science, Electrical Engineering, a related field, and 5+ years relevant industry experience.
  • Experience in building and scaling distributed systems, as well as experience with containerization and orchestration technologies like Kubernetes.
  • Production level expertise with containerization orchestration engines (e.g. Kubernetes) and proven understanding of modern, continuous development techniques and pipelines (IaC, CI/CD, ArgoCD, Git).
  • Fundamental programming skills, ideally practical experience in one (and preferably more) of the following languages: Python, Go.
  • Good knowledge of infrastructure configuration management tools like Ansible and Terraform.
  • Experience in using observability and tracing-related tools like InfluxDB, Prometheus, and Elastic Stack.
  • An understanding of AI/ML, including ML frameworks, public cloud, and commercial AI/ML solutions - familiarity with Pytorch, SageMaker, HuggingFace, NVIDIA TensorRT or OpenAI Triton a plus.
What We Offer

At Adobe, we offer a competitive compensation package, including a base salary and short-term incentives. We also offer long-term incentives in the form of a new hire equity award. Adobe is proud to be an Equal Employment Opportunity and affirmative action employer. We do not discriminate based on gender, race or color, ethnicity or national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, or any other applicable characteristics protected by law.



  • Seattle, Washington, United States Tik Tok Full time

    About the RoleWe're seeking a skilled Site Reliability Engineer to join our AML team at TikTok. As a Site Reliability Engineer, you'll play a critical role in designing, building, and maintaining highly available, scalable, and fault-tolerant systems that support our AI/ML recommendation engine.ResponsibilitiesDesign and build highly available, scalable, and...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleTikTok is a leading destination for short-form mobile video, and we're committed to inspiring creativity and bringing joy to our users. As a Site Reliability Engineer on our Video Platform team, you'll play a critical role in ensuring the reliability and performance of our video system, which serves billions of users...


  • Seattle, Washington, United States Phaidra Full time

    About PhaidraPhaidra is a pioneering company in the industrial automation sector, leveraging AI-powered control systems to enable facilities to adapt and improve over time.Our mission is to revolutionize the way industrial facilities operate, making them more efficient, sustainable, and responsive to their environment.Job DescriptionWe are seeking a highly...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Platform team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, fault-tolerance, and scalability of our data infrastructure.ResponsibilitiesDesign and implement reliable, scalable, and robust big data systems that support core...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Platform team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, fault-tolerance, and scalability of our data infrastructure.Key ResponsibilitiesDesign, build, and maintain large-scale data systems that support our core products and...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleTikTok is seeking a skilled Site Reliability Engineer to join our AML team, where you will combine system engineering and machine learning expertise to develop and run a massively distributed AI/ML recommendation system.ResponsibilitiesDesign, build, and maintain highly available, scalable, and fault-tolerant systems.Monitor and analyze system...


  • Seattle, Washington, United States Maestro AI Full time

    About Maestro AIMaestro AI is a cutting-edge technology company that specializes in developing innovative AI solutions for various industries. Our mission is to revolutionize organizational efficiency by making work understandable, streamlined, and connected for everyone.Our VisionWe envision a future where every organization can harness the full potential...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Platform Team. As a key member of our team, you will be responsible for designing, building, and maintaining large-scale, distributed data systems that support our core products and business.ResponsibilitiesEngage in the entire lifecycle of service, from inception and...


  • Seattle, Washington, United States Phaidra Full time

    About PhaidraPhaidra is a cutting-edge technology company that's revolutionizing the industrial automation sector. Our mission is to empower facilities to adapt and improve over time, leveraging AI-powered control systems that learn and evolve continuously.We're a team of innovators, engineers, and problem-solvers who share a passion for creating...


  • Seattle, Washington, United States Tik Tok Full time

    {"title": "Site Reliability Engineer", "content": "About the RoleTikTok is seeking an experienced Site Reliability Engineer to join our USDS Video Platform team. As a key member of our team, you will be responsible for ensuring the reliability and scalability of our video system, which serves billions of users worldwide.ResponsibilitiesDesign and implement...


  • Seattle, Washington, United States Tik Tok Full time

    About TikTok U.S. Data SecurityTikTok is a leading destination for short-form mobile video, inspiring creativity and bringing joy to users worldwide. U.S. Data Security (USDS) is a subsidiary of TikTok in the U.S., dedicated to protecting user data and ensuring the security of the TikTok platform.Job DescriptionWe are seeking an experienced Site Reliability...


  • Seattle, Washington, United States Tik Tok Full time

    {"title": "Site Reliability Engineer", "content": "About the RoleTikTok is seeking a highly skilled Site Reliability Engineer to join our US Data Security team. As a key member of our Video Platform team, you will be responsible for ensuring the reliability and scalability of our video system, which serves billions of users worldwide.ResponsibilitiesDesign...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleTikTok is the leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. Our U.S. Data Security (USDS) division was created to bring heightened focus and governance to our data protection policies and content assurance protocols, ensuring the safety of U.S. users. We're seeking a skilled Site...


  • Seattle, Washington, United States Tik Tok Full time

    About TikTok U.S. Data SecurityTikTok U.S. Data Security is a subsidiary of TikTok in the U.S., dedicated to protecting user data and ensuring the security of our platform.ResponsibilitiesWe are seeking a highly motivated and experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the...


  • Seattle, Washington, United States Tik Tok Full time

    About TikTok U.S. Data SecurityTikTok U.S. Data Security is a subsidiary of TikTok in the U.S., dedicated to protecting user data and ensuring the security of our platform.ResponsibilitiesWe are seeking a highly motivated and experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the...


  • Seattle, Washington, United States Tik Tok Full time

    About TikTok U.S. Data SecurityTikTok U.S. Data Security is a subsidiary of TikTok in the U.S., dedicated to protecting user data and ensuring the security of our platform.ResponsibilitiesWe are seeking a highly motivated and experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the...


  • Seattle, Washington, United States Scale AI Full time

    Lead Account Executive - AI GrowthScale is seeking an experienced Lead Account Executive to drive growth and impact from our AI infrastructure platform. As part of our growing GTM team, you will oversee a customer segment, drive pipeline, and close deals within this segment.Key Responsibilities:Own Scale's execution for driving impact through customer growth...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleWe are seeking an experienced Site Reliability Engineer to join our USDS Video Platform team at TikTok. As a key member of our team, you will be responsible for ensuring the reliability and scalability of our video system, which serves billions of users worldwide.ResponsibilitiesDesign and implement scalable and reliable systems to support our...


  • Seattle, Washington, United States Oracle Full time

    Job DescriptionOracle is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, develop, and deploy automation tools to improve the efficiency and reliability of our...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleThis is a Site Reliability Engineer position, focusing on the data pipeline reliability for the Video Platform team in USDS.Data SREs monitor data and keep production batch and real-time processing jobs up and running with the highest level of availability, ensuring our users have the freshest, complete, and correct data...