Reliability Engineer

2 weeks ago


Mountain View, California, United States TikTok Full time

TikTok is a premier platform for short-form mobile video, dedicated to fostering creativity and delivering joy. Our Trust and Safety engineering division is rapidly expanding, focusing on developing machine learning models and systems aimed at identifying and mitigating internet abuse and fraud across our platform.

Our objective is to safeguard billions of users and publishers globally on a daily basis. We leverage cutting-edge machine learning technologies to enhance our trust and safety systems, utilizing the vast amounts of data generated on our platform.

Through the relentless efforts of our team, TikTok strives to provide an exceptional user experience, spreading joy worldwide.

In this role, you will tackle complex scalability challenges while applying your expertise in coding, algorithms, complexity analysis, and large-scale system design.

Key Responsibilities:

  • Oversee daily operations of data services and real-time/batch data pipelines, including SLA management, system deployment, performance optimization, and troubleshooting.
  • Develop tools and automation to enhance system administration and operational efficiency.
  • Participate in regular on-call duties.
  • Contribute to and refine the entire lifecycle of services from inception and design through development, capacity planning, launch reviews, deployment, operation, and refinement.
  • Design and implement software platforms and monitoring frameworks for effective, automated, and intelligent service-oriented architecture (SOA) governance.
  • Ensure sustainable system scalability through automation; enhance system reliability, efficiency, and velocity by advocating for necessary changes.
  • Engage in sustainable user support, incident response, and conduct blameless postmortems.

Qualifications:

  • Bachelor's degree in Computer Science or a related field, with a minimum of 3 years of relevant experience.
  • Proven independent thinking and troubleshooting abilities.
  • Proficiency in programming languages such as Python, Go, C, C++, Java, or Rust.
  • Familiarity with backend systems including MySQL, Redis, Nginx, Kafka, Kubernetes, Docker, and big data technologies like Hadoop, Spark, Flink, Hive, OLAP, ClickHouse, etc.
  • Understanding of Unix/Linux system internals, networking, and distributed systems.
  • Strong communication and coordination skills.
  • Experience in Trust & Safety is advantageous.

TikTok is dedicated to fostering an inclusive environment where employees are recognized for their skills, experiences, and unique perspectives. Our platform connects individuals globally, and we aim to reflect this diversity within our workplace. We believe that all individuals should be evaluated based on their strengths and experiences, free from bias related to background or identity.

TikTok is committed to providing reasonable accommodations during our recruitment process.



  • Mountain View, California, United States CUSHMAN Full time

    Job TitleLead Reliability EngineerJob Description SummaryThe Lead Facilities Reliability Engineer will develop, implement and track facilities reliability and maintenance engineering programs at client site with a focus on performing facilities condition assessments and maintaining the facilities condition assessment database. Utilizing plant...


  • Mountain View, California, United States Samsung Full time

    Embedded Site Reliability Engineer (Samsung Ads)remote typeHybridlocations645 Clyde Avenue, Mountain View, CA, USAOne Pennsylvania Plaza, 26th Floor, New York, NY, USAtime typeFull timejob requisition idR84565Position SummaryIn recent years, Samsung has transformed its hardware dominance into a dynamic ecosystem of engaging services across devices. Enter...


  • Mountain View, California, United States Samsung Electronics Full time

    Position OverviewSamsung has evolved from a hardware leader into a vibrant ecosystem of innovative services across devices. At the forefront of this transformation is Samsung Ads, a flourishing division poised for significant growth.Our Global Ads Product & Engineering team, with a robust presence across multiple countries, is integral to this advancement....


  • Mountain View, California, United States CENTRL Full time

    CENTRL is looking for a skilled and proactive Senior Site Reliability Engineer to enhance our cloud and infrastructure operations. In this pivotal role, you will be responsible for the strategic oversight, planning, and implementation of our IT systems to ensure optimal performance, scalability, and availability.Key ResponsibilitiesAnalyze and gather metrics...


  • Mountain View, California, United States CENTRL Full time

    CENTRL is looking for a highly skilled and innovative professional to take on the role of Senior Site Reliability Engineer. In this pivotal position, you will be responsible for the strategic oversight, planning, and implementation of our cloud and infrastructure operations, ensuring optimal availability, scalability, and performance of our IT systems.Key...


  • Mountain View, California, United States CENTRL Full time

    CENTRL is looking for a highly skilled and innovative Senior Site Reliability Engineer to take charge of our cloud and infrastructure operations. In this pivotal role, you will be responsible for the strategic oversight, planning, and implementation of our IT systems to guarantee optimal performance, scalability, and availability.Key ResponsibilitiesAnalyze...


  • Mountain View, California, United States VentureDive Full time

    Job Brief:As Data Platform Site Reliability Engineering you will manage infrastructure and applications on cloud computing platforms to deliver data processing, governance, and storage. Our platform teams work with exabytes of data, terabytes of memory, and hundreds of thousands of jobs to enable predictable and performant data analytics.As an SRE, you'll...

  • Reliability Engineer

    2 weeks ago


    Mountain View, California, United States TikTok Full time

    TikTok stands as a premier platform for short-form mobile video, dedicated to fostering creativity and spreading joy. Our global presence spans numerous cities, reflecting our commitment to innovation and community. The Trust and Safety Engineering Team is rapidly expanding, focusing on the development of advanced machine learning models and systems aimed at...

  • Reliability Engineer

    2 weeks ago


    Mountain View, California, United States TikTok Full time

    TikTok is the premier platform for short-form mobile video, dedicated to fostering creativity and spreading joy. Our Trust and Safety engineering division is rapidly expanding, focusing on the development of machine learning models and systems designed to combat internet abuse and fraud. Our objective is to safeguard billions of users and content creators...


  • Mountain View, California, United States Insight Global Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team in the Bay Area. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud...


  • Mountain View, California, United States Atlassian Full time

    About the RoleWe're seeking a highly skilled Cloud Infrastructure Engineer to join our Site Reliability team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.Key ResponsibilitiesDesign and Implement Cloud Infrastructure: Collaborate with...


  • Mountain View, California, United States Groq Full time

    About the RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and related support systems.Key...


  • Mountain View, California, United States Motion Recruitment Full time

    About the RoleMotion Recruitment is seeking a highly skilled Linux Systems Engineer to join our team. As a Site Reliability Engineer, you will be responsible for managing and maintaining large-scale Linux environments, implementing automation, and ensuring the reliability and scalability of our systems.Key ResponsibilitiesDesign, implement, and maintain...


  • Mountain View, California, United States Atlassian Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.Key ResponsibilitiesImprove Service Reliability: Actively work to improve the performance and reliability of...


  • Mountain View, California, United States TikTok Full time

    TikTok stands as a premier platform for short-form mobile video, dedicated to fostering creativity and delivering joy. Our global presence spans numerous cities, enhancing our mission to protect users and content creators worldwide. The Trust and Safety Engineering Team is rapidly expanding, tasked with developing advanced machine learning models and systems...


  • Mountain View, California, United States eTek IT Services, Inc. Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer - Cloud Infrastructure to join our team at eTek IT Services, Inc.Role: As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure.Responsibilities:Data Monitoring and Alerting: Design and implement...


  • Mountain View, California, United States Yoh Full time

    Job Summary:Yoh, a Day & Zimmermann company, is seeking a highly skilled Reliability Test Engineer to join our team. As a key member of our engineering team, you will be responsible for ensuring the reliability and quality of our products through various testing procedures.Key Responsibilities:Execute established reliability test procedures and perform...


  • Mountain View, California, United States TikTok Full time

    About the RoleTikTok is seeking a highly skilled Site Reliability Engineer to join our Trust and Safety engineering team. As a Site Reliability Engineer, you will be responsible for managing the day-to-day operations of our data services, including SLA management, system deployment, performance tuning, and troubleshooting.Key ResponsibilitiesManage...


  • Mountain View, California, United States Optomi Full time

    Exciting Opportunity for a Systems Reliability SpecialistWe are seeking a talented Systems Reliability Specialist to become part of a reputable consulting organization. If you possess a strong technical foundation and a proactive mindset, this role could be an excellent fit for you. As an integral member of the Reliability team, your primary focus will be to...


  • Mountain View, California, United States BCForward Full time

    Job DescriptionBCforward is currently seeking a highly motivated Site Reliability Engineer for an opportunity in a dynamic and innovative company.Position Title: Site Reliability EngineerLocation: Remote (with occasional on-site visits)Job Type: Contract (40 hours weekly), HybridPay Range: $95/hr - $97/hrPlease note that actual compensation may vary within...