AI Ops Site Reliability Engineer

3 weeks ago


San Jose CA, United States TikTok Full time

DescriptionTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible. Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day. To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve. Join us.Join our innovative Site Reliability Engineering (SRE) team that merges software development with infrastructure operations to manage large-scale, highly distributed systems. We leverage cutting-edge AI technology, such as Large Language Models (LLM), for efficiency and actively shaping the future of AI Ops technology.Key Responsibilities:- Develop and implement AI-based software for efficient and intelligent management of service-oriented architecture (SOA), driving research on ML algorithms, and leveraging AI technology to solve complex site reliability issues.- Explore practical applications of LLM technology in the field of AI Ops, providing algorithmic services such as intelligent interaction, root cause analysis, and anomaly detection.- Construct an LLM applications framework, integrate it into a unified SRE software platform, and provide intelligent services to enhance operational efficiency.- Continuously keep up with cutting-edge LLM technologies, open-source solutions, and their applications in the field of AI Ops.Qualifications- Bachelor's degree in Computer Science or equivalent, with 5+ years of experience as an ML Engineer or ML Applied Scientist.- Experience with AI Ops, particularly with the stability of cloud platforms. This includes, but is not limited to, anomaly detection, log monitoring, fault diagnosis, and root cause analysis.- Proficiency in the algorithmic principles of mainstream large language models (such as GPT, ChatGPT, LLaMA), fine-tuning strategies, prompt engineering, vector databases, and application paradigms like LangChain.- Strong problem-solving and communication skills, excellent data sensitivity, and business understanding, capable of deriving valuable insights from complex business data.TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at dataecommerce.accommodations@tiktok.comRegularExperienced



  • San Jose, United States HCLTech Full time

    About HCLTech:HCLTech is a global technology company, home to 221,000+ people across 60 countries, delivering industry-leading capabilities centered around digital, engineering and cloud, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Engineering Services,...

  • Side Hustle Expert

    5 days ago


    San Jose, United States AI Prompt Engineer - Fud Full time

    At Fud, we are revolutionizing the way people approach making money by creating the world's first Social Hustling Community. Our platform connects individuals with the know-how and resources they need to take action and put more money in their pockets. We believe that everyone has the potential to be a side hustle expert, and we are looking for an...


  • San Francisco, CA, United States Ponce Ai Full time

    Job Description Job Description What to Expect: We are seeking a skilled and creative ML Ops Engineer to join our team. As an ML Ops Engineer you will be responsible for utilizing open-source diffusion image generation models to develop high-quality and visually appealing photorealistic images that incorporate the cosmetic medical procedure results....

  • AI/ML Ops Engineer

    4 days ago


    San Francisco, CA, United States Advocate Full time

    Advocate is a mission-driven technology company revolutionizing the way Americans access critical federal benefits. Our cutting-edge AI platform streamlines the application process, ensuring that every submission is complete, optimized, and tailored to the specific requirements of each federal program. Our innovative technology not only simplifies the...


  • Redwood City, CA, United States C3 AI Full time

    NYSE:AI) is a leading Enterprise AI software provider for accelerating digital transformation. The proven C3 AI Platform provides comprehensive services to build enterprise-scale AI applications more efficiently and cost-effectively than alternative approaches. The C3 AI Platform supports the value chain in any industry with prebuilt, configurable,...


  • San Francisco, CA, United States Apollo Solutions Full time

    Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking artifical inteligence business who are making major developments in how we use AI/ML for gaming/security. They are working closely with government contracts as well as gaming consoles companys and are now searching for an SRE to join their growing team. The Site Reliability...


  • San Francisco, United States Ponce Ai Full time

    Job DescriptionJob DescriptionWhat to Expect:We are seeking a skilled and creative ML Ops Engineer to join our team. As an ML Ops Engineer you will be responsible for utilizing open-source diffusion image generation models to develop high-quality and visually appealing photorealistic images that incorporate the cosmetic medical procedure results. You’ll...


  • San Francisco, United States Snorkel AI Full time

    We're on a mission to democratize AI by building the definitive AI data development platform. The AI landscape has gone through incredible change between 2016, when Snorkel started as a research project in the Stanford AI Lab, to the generative AI breakthroughs of today. But one thing has remained constant: the data you use to build AI is the key to...


  • Redwood City, CA, United States Snorkel AI, Inc. Full time

    We are looking for a Director of Engineering to lead our AI Platform team. Our AI Platform team builds innovative software systems to power the Snorkel Flow platform. This includes services to train and serve generative AI and machine learning models using novel data-centric techniques, libraries to support AI workflows for a variety of data modalities and...


  • San Francisco, United States Snorkel AI, Inc. Full time

    We are looking for a Director of Engineering to lead our AI Platform team. Our AI Platform team builds innovative software systems to power the Snorkel Flow platform. This includes services to train and serve generative AI and machine learning models using novel data-centric techniques, libraries to support AI workflows for a variety of data modalities and...


  • San Francisco, CA, United States Alembic Full time

    Alembic applies cutting-edge algorithms and composite AI solutions to provide a new approach for marketing data analytics. Unlike tools that only provide correlation, only Alembic provides true causation, giving organizations across sector and industry the ability to quantify the value of every marketing activity and maximize future marketing investments....


  • San Francisco, United States Anthropic Full time

    We are looking for a Site Reliability Engineer who will ensure the high availability and performance of our Kubernetes clusters that power machine learning research and services. About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole....


  • San Francisco, United States Anthropic Full time

    We are looking for a Site Reliability Engineer who will ensure the high availability and performance of our Kubernetes clusters that power machine learning research and services. About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole....


  • San Francisco, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • San Jose, United States Myriad Consulting Inc Full time

    This role also open for junior (3+ yoe) candidates, and SRE lead (7+ yoe).Site Reliability Engineering(SRE) team combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you ll have the opportunity to manage the complex challenges of scale, while using expertise in coding,...

  • AI Engineer

    4 days ago


    San Francisco, CA, United States Cynch AI Full time

    Come Revolutionize Accounting with AI We're a seed-stage AI startup led by a founding team of AI startup veterans on a mission to revolutionize the accounting industry. We're seeking an exceptional AI Engineer to help turn our vision into reality. We are combining reasoning, machine learning, and generative AI to augment and democratize the...


  • San Francisco, CA, United States Observable Full time

    Observable is redefining how businesses create and share data apps by giving developers the tools they need to create their best dashboards, applications, and reports. Our open-source Framework allows developers to build dashboards locally while our secure hosting service makes it easy for teams to share data apps and discover deeper insights together. We...


  • San Francisco, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry.  With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • San Francisco, CA, United States NLP PEOPLE Full time

    Job Description What to Expect: We are seeking a skilled and creative ML Ops Engineer to join our team. As an ML Ops Engineer you will be responsible for utilizing open-source diffusion image generation models to develop high-quality and visually appealing photorealistic images that incorporate the cosmetic medical procedure results. You’ll work...


  • San Francisco, California, United States Observable Full time

    Observable is seeking a full-time infrastructure and site reliability engineer to help improve, administrate, and grow Observable systems as we scale to meet our customer's needs.What you will doPerform site reliability and ops work for Observable production and staging environments. (Manage servers Tweak WAF rules Optimize SQL queries And more)Design and...