AI Ops Site Reliability Engineer

3 weeks ago


San Jose, United States Tik Tok Full time

DescriptionTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo.Why Join UsCreation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible. Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day. To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve. Join us.Join our innovative Site Reliability Engineering (SRE) team that merges software development with infrastructure operations to manage large-scale, highly distributed systems. We leverage cutting-edge AI technology, such as Large Language Models (LLM), for efficiency and actively shaping the future of AI Ops technology.Key Responsibilities:- Develop and implement AI-based software for efficient and intelligent management of service-oriented architecture (SOA), driving research on ML algorithms, and leveraging AI technology to solve complex site reliability issues.- Explore practical applications of LLM technology in the field of AI Ops, providing algorithmic services such as intelligent interaction, root cause analysis, and anomaly detection.- Construct an LLM applications framework, integrate it into a unified SRE software platform, and provide intelligent services to enhance operational efficiency.- Continuously keep up with cutting-edge LLM technologies, open-source solutions, and their applications in the field of AI Ops.Qualifications- Bachelor's degree in Computer Science or equivalent, with 5 years of experience as an ML Engineer or ML Applied Scientist.- Experience with AI Ops, particularly with the stability of cloud platforms. This includes, but is not limited to, anomaly detection, log monitoring, fault diagnosis, and root cause analysis.- Proficiency in the algorithmic principles of mainstream large language models (such as GPT, ChatGPT, LLaMA), fine-tuning strategies, prompt engineering, vector databases, and application paradigms like LangChain.- Strong problem-solving and communication skills, excellent data sensitivity, and business understanding, capable of deriving valuable insights from complex business data.TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at dataecommerce.accommodationstiktok.comRegularExperienced



  • San Jose, United States HCLTech Full time

    About HCLTech:HCLTech is a global technology company, home to 221,000+ people across 60 countries, delivering industry-leading capabilities centered around digital, engineering and cloud, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Engineering Services,...


  • San Francisco, United States Ponce Ai Full time

    Job DescriptionJob DescriptionWhat to Expect:We are seeking a skilled and creative ML Ops Engineer to join our team. As an ML Ops Engineer you will be responsible for utilizing open-source diffusion image generation models to develop high-quality and visually appealing photorealistic images that incorporate the cosmetic medical procedure results. You’ll...


  • San Francisco, United States Snorkel AI Full time

    We're on a mission to democratize AI by building the definitive AI data development platform. The AI landscape has gone through incredible change between 2016, when Snorkel started as a research project in the Stanford AI Lab, to the generative AI breakthroughs of today. But one thing has remained constant: the data you use to build AI is the key to...


  • San Francisco, United States Snorkel AI, Inc. Full time

    We are looking for a Director of Engineering to lead our AI Platform team. Our AI Platform team builds innovative software systems to power the Snorkel Flow platform. This includes services to train and serve generative AI and machine learning models using novel data-centric techniques, libraries to support AI workflows for a variety of data modalities and...


  • San Francisco, United States Anthropic Full time

    We are looking for a Site Reliability Engineer who will ensure the high availability and performance of our Kubernetes clusters that power machine learning research and services. About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole....


  • San Francisco, United States Anthropic Full time

    We are looking for a Site Reliability Engineer who will ensure the high availability and performance of our Kubernetes clusters that power machine learning research and services. About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole....


  • San Francisco, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • San Jose, United States Myriad Consulting Inc Full time

    This role also open for junior (3+ yoe) candidates, and SRE lead (7+ yoe).Site Reliability Engineering(SRE) team combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you ll have the opportunity to manage the complex challenges of scale, while using expertise in coding,...


  • San Francisco, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry.  With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • San Francisco, California, United States Observable Full time

    Observable is seeking a full-time infrastructure and site reliability engineer to help improve, administrate, and grow Observable systems as we scale to meet our customer's needs.What you will doPerform site reliability and ops work for Observable production and staging environments. (Manage servers Tweak WAF rules Optimize SQL queries And more)Design and...


  • San Francisco, United States Talkdesk Full time

    At Talkdesk, we are courageous innovators focused on helping organizations around the world create better customer experiences. Our AI-powered cloud contact center solutions optimize our customers’ most critical customer service processes. We are recognized as a Contact Center as a Service (CCaaS) leader by influential research organizations including...


  • San Jose, United States HireIO Inc Full time

    Job Description Job Description Introduction We are an all-in-one video editing solution that helps you create incredible videos. With the mission of making content creation easier and more engaging, we were first launched on mobile platforms in April 2020. In less than a year, we were released in Brazil, US, Indonesia, Japan and several other countries. To...

  • AI Engineer

    2 weeks ago


    San Jose, United States Diverse Lynx Full time

    Engineer- AI/ AI/Client, Python, Linux, C/C++, Shell Scripting Bachelor/master's in computer science, computer engineering, data science/analytics, or a related field Strong Python programming skills Good C/C++ programming skills Excellent written/verbal communication skills Experience in a field associated with the deployment of AI/Client models Experience...

  • AI Engineer

    1 month ago


    San Jose, United States Diverse Lynx Full time

    Engineer- AI/ AI/Client, Python, Linux, C/C++, Shell Scripting Bachelor/master's in computer science, computer engineering, data science/analytics, or a related field Strong Python programming skills Good C/C++ programming skills Excellent written/verbal communication skills Experience in a field associated with the deployment of AI/Client models Experience...

  • AI Engineer

    3 weeks ago


    San Jose, United States Diverse Lynx Full time

    Engineer- AI/ AI/Client, Python, Linux, C/C++, Shell Scripting Bachelor/master's in computer science, computer engineering, data science/analytics, or a related field Strong Python programming skills Good C/C++ programming skills Excellent written/verbal communication skills Experience in a field associated with the deployment of AI/Client models Experience...


  • San Jose, United States Hireio, Inc. Full time

    Job DescriptionJob DescriptionIntroductionWe are an all-in-one video editing solution that helps you create incredible videos. With the mission of making content creation easier and more engaging, we were first launched on mobile platforms in April 2020.In less than a year, we were released in Brazil, US, Indonesia, Japan and several other countries. To...


  • San Jose, United States Hireio, Inc. Full time

    Job DescriptionJob DescriptionIntroduction We are an all-in-one video editing solution that helps you create incredible videos. With the mission of making content creation easier and more engaging, we were first launched on mobile platforms in April 2020. In less than a year, we were released in Brazil, US, Indonesia, Japan and several other countries. To...


  • San Francisco, United States Talkdesk Full time

    At Talkdesk, we are courageous innovators focused on helping organizations around the world create better customer experiences. Our AI-powered cloud contact center solutions optimize our customers’ most critical customer service processes. We are recognized as a Contact Center as a Service (CCaaS) leader by influential research organizations including...


  • San Francisco, United States OpenAI Full time

    About the team: Reliable services are what enables Open AI to train the best AI models in the world and to bring the promise of safe, effective AI to the world. The SRE team in research is responsible for defining, measuring, and improving the reliability of the research platform. The SRE team works closely with the supercomputing and hardware health teams...


  • San Jose, United States OKX Full time

    Who We Are OKX is revolutionising world systems through our cutting-edge digital asset exchange, Web3 portal and blockchain ecosystems.We are deeply committed to shaping a fairer, more transparent and accessible society through blockchain technology and to date, we have 50+ million users, 3000+ employees and 180+ countries believing in the same vision as us....