Site Reliability Engineer

4 weeks ago


Seattle, Washington, United States Tik Tok Full time
About the Role

TikTok's video system is a world-leading platform that provides multimedia storage, delivery, and transcoding services. As part of the U.S. Data Security team, the Video Platform team is responsible for building the next generation video processing platform, which provides excellent experiences for billions of users worldwide.

We are seeking an experienced Site Reliability Engineer to help us continue improving TikTok's video system. If you are passionate about ensuring software reliability, love problem-solving, and are prepared for exciting challenges, we would like you to join our team.

Responsibilities

- Ensure the overall reliability of TikTok's video system, including video publishing and distribution.
- Perform lifecycle management of production systems, including change management, service deployment, operations, and emergency response.
- Monitor the system and respond to incidents to maintain system service level agreement (SLA), review, and follow up on all production incidents.
- Perform capacity management of compute, storage, and network bandwidth resources to ensure system stability and save infrastructure costs.
- Provide strong support during big events to ensure the system is capable of consuming a large volume of Internet traffic.
- Build tools, automations, visualizations, and monitors to facilitate the operation and optimization of the global infrastructure.

Requirements

- Bachelor's degree in Computer Science or a related technical background involving software/system engineering, or equivalent working experience.
- 2+ years of SRE or DevOps experience in large-scale online services.
- Programming experience with at least one of the following languages: C, C++, Java, Python, C#, or Go.

What We Offer

- A dynamic and collaborative work environment.
- Opportunities for growth and professional development.
- A competitive salary and benefits package.

How to Apply

Interested candidates should submit their resume and a cover letter outlining their experience and qualifications for the role. We look forward to hearing from you

  • Seattle, Washington, United States Sogeti Full time

    Job Title: Site Reliability EngineerAbout the Role:We are seeking an experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud-based infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using Azure or...


  • Seattle, Washington, United States Oracle Full time

    About the Role:Oracle is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, develop, and deploy software to improve the availability, scalability, and efficiency of...


  • Seattle, Washington, United States Oracle Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Oracle. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure. You will work closely with our development teams to design, implement, and operate large-scale distributed...


  • Seattle, Washington, United States HireIO Inc Full time

    Job SummaryAt HireIO Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and reliability of our Ads systems. This includes designing, analyzing, and troubleshooting large-scale distributed systems, as well as developing tools and...


  • Seattle, Washington, United States Diverse Lynx Full time

    Job Title: Sr. Site Reliability EngineerLocation: RemoteDuration: 12+ Months contractJob Description:We are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and performance of our applications and services.You will work...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Platform Team at TikTok. As a key member of our team, you will be responsible for designing, building, and operating large-scale, massively distributed services and infrastructures.Key ResponsibilitiesDesign and implement reliable, scalable, and robust big data systems...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleThis is a Site Reliability Engineer position, focusing on the data pipeline reliability for the Video Platform team in USDS.Data SREs monitor data and keep production batch and real-time processing jobs up and running with the highest level of availability, ensuring our users have the freshest, complete, and correct data...


  • Seattle, Washington, United States Apple Full time

    Job DescriptionWe are seeking a highly skilled Security Site Reliability Engineer (SRE) to join our dynamic and growing team at Apple. As a Security SRE, you will play a critical role in ensuring the security, reliability, and scalability of our systems and infrastructure.You will collaborate with cross-functional teams to design, implement, and maintain...


  • Seattle, Washington, United States Hireio, Inc. Full time

    Job OverviewHireio, Inc. is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our Ads systems team, you will be responsible for ensuring the reliability, scalability, and operability of our services.Key ResponsibilitiesDesign and implement scalable and reliable systems architectureCollaborate with cross-functional teams...


  • Seattle, Washington, United States F5 Networks Full time

    Job SummaryF5 Networks is seeking a highly skilled Site Reliability Engineer III to join our team. As a Site Reliability Engineer III, you will be responsible for ensuring the reliability, availability, and scalability of critical systems and SaaS platforms.Key ResponsibilitiesApply modern engineering principles and practices to operational functions and...


  • Seattle, Washington, United States DAT Freight Solutions Full time

    About DAT Freight SolutionsDAT Freight Solutions is a leading provider of transportation management software and services. We are seeking a highly skilled Site Reliability Engineering Lead to join our team.The successful candidate will be responsible for leading major technical initiatives and mentoring engineers to enhance their skills. They will work...


  • Seattle, Washington, United States Qualtrics Full time

    We are looking for a Site Reliability Engineer Manager to lead our Gov1 environment in the Foundation Product Unit.This person will be responsible for managing a team of US-based Support Engineers who will support Gov1 activities for non-US teams in the Foundation org.The ideal candidate will have experience in site reliability engineering, team management,...


  • Seattle, Washington, United States Qualtrics Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer Manager to lead our SRE team in the Foundation Product Unit. As a key member of our team, you will be responsible for ensuring the reliability and scalability of our Gov1 environment.As a Site Reliability Engineer Manager, you will be responsible for leading a team of SREs, collaborating...


  • Seattle, Washington, United States F5 Networks Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at F5 Networks. As a key member of our engineering team, you will be responsible for ensuring the reliability and performance of our systems.Key ResponsibilitiesDesign and implement scalable and efficient system architecturesDevelop and maintain monitoring and...


  • Seattle, Washington, United States DAT Solutions Full time

    About DAT SolutionsWe are a next-generation SaaS technology company that has been at the leading edge of innovation in transportation supply chain logistics for 45 years.We continue to transform the industry year over year, by deploying a suite of software solutions to millions of customers every day - customers who depend on us for the most relevant data...


  • Seattle, Washington, United States Phaidra Full time

    About PhaidraPhaidra is a pioneering company in the field of industrial automation, leveraging AI-powered control systems to enable facilities to automatically learn and improve over time.Our team has a proven track record of applying AI to complex problems, with achievements such as achieving superhuman performance with DeepMind's AlphaGo and reducing the...


  • Seattle, Washington, United States Apple Full time

    Role OverviewAs a Site Reliability Engineering Manager at Apple, you will be responsible for leading a team that provides the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish.Key ResponsibilitiesEstablish SRE practices for a private cloud service to accelerate...


  • Seattle, Washington, United States DAT Solutions Full time

    About DAT SolutionsAs a leading employer of choice, DAT Solutions is a next-generation SaaS technology company that has been at the forefront of innovation in transportation supply chain logistics for decades.We continue to transform the industry by deploying a suite of software solutions to millions of customers every day, providing them with the most...


  • Seattle, Washington, United States ApTask Full time

    The Client is a leading global IT services and consulting company, providing a wide range of services to clients in various industries, including banking, financial services, retail, manufacturing, healthcare, and more. The company places a strong emphasis on employee training and development, and is known for its commitment to innovation and investment in...


  • Seattle, Washington, United States Apple Full time

    Role SummaryAs a Site Reliability Engineering Manager at Apple, you will lead a team responsible for providing the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish.Key ResponsibilitiesEstablish SRE practices for a private cloud service to accelerate our ability...