Site Reliability Engineer

4 weeks ago


Seattle, Washington, United States Tik Tok Full time

About the Role

TikTok is seeking an experienced Site Reliability Engineer to join our US Data Security team. As a key member of our Video Platform team, you will be responsible for ensuring the reliability and performance of our video system, which serves billions of users worldwide.

Key Responsibilities

  • Oversee the overall reliability of TikTok's video system, including video publishing and distribution.
  • Perform lifecycle management of production systems, including change management, service deployment, operations, and emergency response.
  • Monitor the system and respond to incidents to maintain system service level agreement (SLA), review and follow up all production incidents.
  • Perform capacity management of compute, storage, and network bandwidth resources to ensure system stability and save infrastructure costs.
  • Provide strong support during big events to ensure the system is capable of consuming a large volume of Internet traffic.
  • Build tools, automations, visualizations, and monitors to facilitate the operation and optimization of the global infrastructure.

Requirements

  • Bachelor's degree in Computer Science or a related technical background involving software/system engineering, or equivalent working experience.
  • 2+ years of SRE or DevOps experience in large-scale online services.
  • Programming experience with at least one of the following languages: C, C++, Java, Python, C#, or Go.

Preferred Qualifications

  • Extensive knowledge of networking, operation system, database system, and container technology.
  • Good understanding of every aspect of microservice architecture, and hands-on experience in troubleshooting in large-scale distributed systems.
  • Hands-on experience in common open-source systems such as Linux, MySQL, MongoDB, Redis, and ELK.
  • Experience in building solutions with AWS, Google, Azure, and other cloud services is a plus.
  • Passionate, self-motivated, and good teamwork skills.

About TikTok

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe, and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy.

We are passionate about this and hope you are too. If you are passionate about ensuring software reliability, love problem-solving, and are prepared for exciting challenges, we would like you to join our team.



  • Seattle, Washington, United States Sogeti Full time

    Job Title: Site Reliability EngineerAbout the Role:We are seeking an experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud-based infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using Azure or...


  • Seattle, Washington, United States Oracle Full time

    About the Role:Oracle is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, develop, and deploy software to improve the availability, scalability, and efficiency of...


  • Seattle, Washington, United States Oracle Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Oracle. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure. You will work closely with our development teams to design, implement, and operate large-scale distributed...


  • Seattle, Washington, United States HireIO Inc Full time

    Job SummaryAt HireIO Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and reliability of our Ads systems. This includes designing, analyzing, and troubleshooting large-scale distributed systems, as well as developing tools and...


  • Seattle, Washington, United States Diverse Lynx Full time

    Job Title: Sr. Site Reliability EngineerLocation: RemoteDuration: 12+ Months contractJob Description:We are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and performance of our applications and services.You will work...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Platform Team at TikTok. As a key member of our team, you will be responsible for designing, building, and operating large-scale, massively distributed services and infrastructures.Key ResponsibilitiesDesign and implement reliable, scalable, and robust big data systems...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleThis is a Site Reliability Engineer position, focusing on the data pipeline reliability for the Video Platform team in USDS.Data SREs monitor data and keep production batch and real-time processing jobs up and running with the highest level of availability, ensuring our users have the freshest, complete, and correct data...


  • Seattle, Washington, United States Hireio, Inc. Full time

    Job OverviewHireio, Inc. is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our Ads systems team, you will be responsible for ensuring the reliability, scalability, and operability of our services.Key ResponsibilitiesDesign and implement scalable and reliable systems architectureCollaborate with cross-functional teams...


  • Seattle, Washington, United States F5 Networks Full time

    Job SummaryF5 Networks is seeking a highly skilled Site Reliability Engineer III to join our team. As a Site Reliability Engineer III, you will be responsible for ensuring the reliability, availability, and scalability of critical systems and SaaS platforms.Key ResponsibilitiesApply modern engineering principles and practices to operational functions and...


  • Seattle, Washington, United States DAT Freight Solutions Full time

    About DAT Freight SolutionsDAT Freight Solutions is a leading provider of transportation management software and services. We are seeking a highly skilled Site Reliability Engineering Lead to join our team.The successful candidate will be responsible for leading major technical initiatives and mentoring engineers to enhance their skills. They will work...


  • Seattle, Washington, United States Qualtrics Full time

    We are looking for a Site Reliability Engineer Manager to lead our Gov1 environment in the Foundation Product Unit.This person will be responsible for managing a team of US-based Support Engineers who will support Gov1 activities for non-US teams in the Foundation org.The ideal candidate will have experience in site reliability engineering, team management,...


  • Seattle, Washington, United States Qualtrics Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer Manager to lead our SRE team in the Foundation Product Unit. As a key member of our team, you will be responsible for ensuring the reliability and scalability of our Gov1 environment.As a Site Reliability Engineer Manager, you will be responsible for leading a team of SREs, collaborating...


  • Seattle, Washington, United States F5 Networks Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at F5 Networks. As a key member of our engineering team, you will be responsible for ensuring the reliability and performance of our systems.Key ResponsibilitiesDesign and implement scalable and efficient system architecturesDevelop and maintain monitoring and...


  • Seattle, Washington, United States DAT Solutions Full time

    About DAT SolutionsWe are a next-generation SaaS technology company that has been at the leading edge of innovation in transportation supply chain logistics for 45 years.We continue to transform the industry year over year, by deploying a suite of software solutions to millions of customers every day - customers who depend on us for the most relevant data...


  • Seattle, Washington, United States Apple Full time

    Role OverviewAs a Site Reliability Engineering Manager at Apple, you will be responsible for leading a team that provides the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish.Key ResponsibilitiesEstablish SRE practices for a private cloud service to accelerate...


  • Seattle, Washington, United States DAT Solutions Full time

    About DAT SolutionsAs a leading employer of choice, DAT Solutions is a next-generation SaaS technology company that has been at the forefront of innovation in transportation supply chain logistics for decades.We continue to transform the industry by deploying a suite of software solutions to millions of customers every day, providing them with the most...


  • Seattle, Washington, United States ApTask Full time

    The Client is a leading global IT services and consulting company, providing a wide range of services to clients in various industries, including banking, financial services, retail, manufacturing, healthcare, and more. The company places a strong emphasis on employee training and development, and is known for its commitment to innovation and investment in...


  • Seattle, Washington, United States Oracle Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Oracle. As a key member of our cloud infrastructure team, you will be responsible for designing, building, and maintaining large-scale distributed systems that provide a seamless experience for our customers.Key Responsibilities:Design and implement sophisticated...


  • Seattle, Washington, United States F5 Full time

    Job SummaryAt F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation. Everything we do...


  • Seattle, Washington, United States Apple Full time

    Job SummaryThe Apple Services Engineering team is seeking a highly skilled Site Reliability Engineering Leader to lead our security-focused SRE team. As a Site Reliability Engineering Leader, you will be responsible for designing, engineering, and running systems and infrastructure that ensure the highest quality Apple Services experience for our customers....