Lead Site Reliability Engineer for Recommendation Systems

2 weeks ago


San Jose, California, United States TikTok Full time

TikTok stands as the premier platform for short-form mobile video, dedicated to fostering creativity and spreading joy.

Our global headquarters are strategically located in key cities, including Los Angeles and Singapore, with additional offices in New York, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo.

Why Choose TikTok? At the heart of TikTok's mission is the celebration of creativity. Our platform is designed to empower imaginations, and this philosophy extends to the talented teams that drive our success.

We believe that every challenge presents an opportunity for learning, innovation, and collective growth. Embracing change and fostering courage are fundamental to our culture. At TikTok, collaboration is key to our impact—both for our organization and the communities we engage with.

About the Recommendation Infrastructure Team: This team is tasked with developing and enhancing the architecture that supports our recommendation system, ensuring a stable and exceptional experience for TikTok users.

Role Responsibilities:

  • Oversee and enhance the entire lifecycle of recommendation systems, from initial design consultations to deployment, operation, and ongoing refinement.
  • Create tools and software aimed at boosting the reliability and scalability of services, automating operations, and enhancing research and development efficiency.
  • Ensure the availability of large-scale services deployed across global data centers.
  • Strategically plan, manage, and optimize cloud resource utilization, maintaining service level agreements for extensive clusters.
  • Continuously measure and monitor service availability, latency, and overall health.
  • Implement sustainable incident response practices and conduct thorough postmortems.

Qualifications:

  • A Bachelor's degree or higher in Computer Science or a related discipline.
  • A minimum of 2 years of experience in Site Reliability Engineering for large-scale system deployments with a focus on high reliability and scalability.
  • Proficiency in system operations, particularly in Linux and networking.
  • Experience in programming with at least one of the following languages: Python, Perl, Go, or C/C++.
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Familiarity with popular CI/CD processes and environments.
  • Strong communication skills, along with a sense of ownership and motivation.

Commitment to Inclusion: TikTok is dedicated to fostering an inclusive environment where every employee is valued for their unique skills, experiences, and perspectives. Our platform connects individuals globally, and we strive to reflect this diversity within our workplace.

We are passionate about our mission to inspire creativity and bring joy, and we are committed to celebrating diverse voices while creating an environment that mirrors the communities we serve.

Accommodations: TikTok is committed to providing reasonable accommodations during the recruitment process for candidates with disabilities, pregnancy, sincerely held religious beliefs, or other legally protected reasons.



  • San Jose, California, United States TikTok Full time

    About TikTokTikTok stands as the premier platform for short-form mobile video, dedicated to inspiring creativity and delivering joy to its users. With global headquarters in Los Angeles and Singapore, and offices in major cities worldwide, TikTok is at the forefront of digital innovation.Why Work with UsAt TikTok, creativity is the essence of our mission....


  • San Jose, California, United States TikTok Full time

    About TikTokTikTok stands as the premier platform for short-form mobile video, dedicated to fostering creativity and delivering joy to users worldwide.Our MissionAt TikTok, we believe in the power of creation. Our platform is designed to empower imaginative minds, and this philosophy extends to our teams who make TikTok a reality. Together, we strive to...


  • San Jose, California, United States TikTok Full time

    About TikTokTikTok stands as the premier platform for short-form mobile video, dedicated to inspiring creativity and spreading joy. With global headquarters in major cities, we foster a vibrant community across various locations.Why Work with UsAt TikTok, creativity is at the heart of our mission. Our platform is designed to nurture imaginative minds, and...


  • San Jose, California, United States TikTok Full time

    TikTok stands as the premier platform for short-form mobile video, dedicated to fostering creativity and delivering joy.Our global headquarters are strategically located in major cities, and we pride ourselves on a diverse and inclusive work environment.Why Work With UsAt TikTok, creativity is at the heart of our mission. We are committed to nurturing a...


  • San Jose, California, United States Zscaler Full time

    About ZscalerAt Zscaler, our Engineering team has developed the largest cloud security platform globally, and we continue to innovate. With over 100 patents and ambitious plans for service enhancement and global expansion, our team has established us as a leader in cloud security, serving more than 15 million users across 185 countries. We invite you to...


  • San Jose, California, United States Adobe Full time

    Site Reliability Engineer page is loadedAdobe's Reliability Engineering team is looking for a Site Reliability Engineer (SRE) to help build and operate services like Adobe Sign. Adobe Sign is the fastest, and easiest way to get contracts signed and filed.You have a track record as a site reliability engineer in large-scale SaaS businesses, and a strong...


  • San Jose, California, United States Zscaler Full time

    About ZscalerAt Zscaler, our Engineering team has developed the largest cloud security platform globally, and we continue to innovate. With over 100 patents and ambitious plans for service enhancement and global expansion, our team has established us as the leader in cloud security, serving more than 15 million users across 185 countries. We invite you to...


  • San Jose, California, United States Zscaler Full time

    About UsZscaler has developed the world's largest cloud security platform, continually innovating and expanding our services. With a robust portfolio of over 100 patents and ambitious plans for global growth, our team has established itself as a leader in cloud security, serving more than 15 million users across 185 countries. We are looking for talented...


  • San Jose, California, United States Hireio, Inc. Full time

    Exciting Opportunity: Data Infrastructure Site Reliability Engineering (SRE) TeamJoin Hireio, Inc., a premier platform for short-form mobile video hosting services. As a trailblazer in technology, our SRE team integrates software development with infrastructure management to architect, construct, and oversee extensive, highly distributed systems. We operate...


  • San Francisco, California, United States AutoRABIT Holding Inc. Full time

    Job OverviewAbout AutoRABIT:AutoRABIT is a rapidly expanding SaaS company recognized as the premier provider of Salesforce DevSecOps solutions tailored for regulated sectors such as finance, insurance, and healthcare. Our platform empowers developers to streamline their workflows, enhancing productivity and accelerating release cycles while adhering to...


  • San Jose, California, United States VDart Inc Full time

    Job OverviewPosition: Lead Site Reliability EngineerLocation: San Jose, CA (Hybrid Work Model)Contract Duration: 6+ monthsExperience Required: 14+ YearsRole Summary:We are in search of a highly experienced and proactive Site Reliability Engineer Consultant. In this pivotal role, you will be responsible for:Key Responsibilities:Enhancing the reliability,...


  • San Jose, California, United States Tik Tok Full time

    Key Responsibilities TikTok stands as the premier platform for short-form mobile video, dedicated to fostering creativity and spreading joy. With a global presence, TikTok operates in various major cities worldwide. Why Join Our Team At TikTok, we pride ourselves on our humble, intelligent, compassionate, and innovative workforce. Our mission is to inspire...


  • San Jose, California, United States VDart Inc Full time

    Job OverviewPosition: Lead Site Reliability EngineerLocation: San Jose, CA (Hybrid Work Model)Contract Duration: 6+ monthsExperience Required: 14+ YearsRole Summary:We are in search of a highly experienced and proactive Site Reliability Engineer Consultant. In this capacity, you will be responsible for:Key Responsibilities:Enhancing the reliability,...


  • San Diego, California, United States Dexcom Full time

    About Dexcom:Founded in 1999, Dexcom, Inc. (NASDAQ: DXCM) is a pioneer in the development and marketing of Continuous Glucose Monitoring (CGM) systems designed for use by individuals with diabetes and healthcare professionals. As a leader in the transformation of diabetes management, Dexcom is committed to providing innovative CGM technology that empowers...


  • San Jose, California, United States Tik Tok Full time

    Key Responsibilities About TikTok TikTok stands as the premier platform for short-form mobile video, dedicated to fostering creativity and spreading joy. With a global presence, TikTok operates in numerous cities worldwide. Why Join Us At TikTok, creativity is at the heart of our mission. Our platform is designed to empower imagination, and this ethos...


  • San Diego, California, United States Mentis Systems Full time

    Job OverviewWe are currently seeking a Senior Reliability Systems Engineer at Mentis Systems. This role is pivotal in overseeing the development of new products and ensuring their reliability throughout the lifecycle.Position DetailsRole: Senior Reliability Systems EngineerDuration: 12+ months ContractLocation: Hybrid/San Diego CAKey ResponsibilitiesThe...


  • San Diego, California, United States Mentis Systems Full time

    Job OverviewWe are currently seeking a Senior Reliability Systems Engineer at Mentis Systems. This role is crucial for driving our New Product Development initiatives.Position DetailsThe Senior Reliability Systems Engineer will utilize their extensive technical expertise and leadership capabilities to guide Systems Engineering through various stages of the...


  • San Diego, California, United States Mentis Systems Full time

    Job SummaryMentis Systems is seeking a highly skilled Senior Systems Engineer to lead our Systems Engineering team in the development of medical devices. As a key member of our team, you will be responsible for providing technical leadership and expertise in the areas of reliability and verification engineering.Key ResponsibilitiesManage the verification and...


  • San Jose, California, United States Zscaler Full time

    About ZscalerZscaler is a leading cloud security platform provider, offering a comprehensive suite of solutions to protect businesses from cyber threats. Our team of experts has built a robust platform that enables organizations to harness the power of the cloud while ensuring the security and integrity of their data.Job SummaryWe are seeking an experienced...


  • San Jose, California, United States Western Digital Full time

    Job OverviewCompany Overview:At Western Digital, we are dedicated to enhancing the way you store and manage data, whether it’s in your pocket, home, car, or the cloud. Our Advanced Reliability Engineering (ARE) team is committed to pioneering reliability assurance methodologies that set industry standards and encompass the entire product lifecycle for our...