Site Reliability Engineer, Data Infrastructure

4 hours ago


San Jose, California, United States Tik Tok Full time
Job Description

TikTok is a leading destination for short-form mobile video, inspiring creativity and bringing joy to users worldwide. Our mission is to empower creators and communities to thrive on our platform.

We are seeking a highly skilled Site Reliability Engineer to join our Compute Platform team, responsible for ensuring the reliability and performance of our Big Data services and products. As a key member of our team, you will play a critical role in shaping the future of our data infrastructure.

Responsibilities
  • Ensure the reliability of our major data warehouse products, services, and query engines, such as ClickHouse, Spark, Presto, and Doris.
  • Uphold Service Level Agreements (SLAs) and respond promptly to system outages or issues.
  • Continuously analyze service performance and reliability patterns to identify potential bottlenecks and implement proactive measures to prevent service disruptions.
  • Lead incident management efforts, troubleshooting and resolving service incidents, and coordinating with cross-functional teams to manage and mitigate service-impacting events.
  • Automate infrastructure provisioning, scaling, and management processes to reduce manual interventions and improve service quality.
  • Collaborate with product and development teams to integrate reliability and performance considerations into the software lifecycle.
Qualifications
  • Bachelor's Degree or above in Computer Science, Engineering, or a related field.
  • Deep understanding of Linux, computer networking, and databases.
  • Proficient in common SRE/DevOps open-source toolsets, system monitoring tools, and container orchestration platforms like Kubernetes.
  • Experience with open-source or commercial technologies such as ClickHouse, Hadoop, Doris, Spark, Presto, and Kubernetes.
  • Strong coding skills in at least one scripting or programming language, including Python, Shell, Java, Go, etc.
  • Excellent problem-solving skills and the ability to think critically under pressure.
  • Strong written and verbal communication skills, with a customer-first mindset and a strong sense of ownership.

TikTok is committed to creating an inclusive environment where employees are valued for their skills, experiences, and unique perspectives. We celebrate our diverse voices and strive to reflect the communities we reach. If you need assistance or a reasonable accommodation, please reach out to us.



  • San Jose, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Compute Platform team at TikTok. As a key member of our team, you will be responsible for ensuring the reliability and performance of our Big Data services and products.ResponsibilitiesDesign and implement proactive measures to prevent service disruptions and ensure high...


  • San Jose, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Compute Platform team at TikTok. As a key member of our team, you will be responsible for ensuring the reliability and performance of our Big Data services and products.ResponsibilitiesEnsure the reliability of all TikTok's major data warehouse products, services, and query...


  • San Jose, California, United States Tik Tok Full time

    About the RoleTikTok is seeking a highly skilled Site Reliability Engineer to join our Compute Platform SRE team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our major data warehouse products, services, and query engines.ResponsibilitiesEnsure the reliability of all TikTok's major data warehouse...


  • San Jose, California, United States Adobe Systems Inc Full time

    {"title": "Site Reliability Engineer", "description": "Transforming Digital ExperiencesAt Adobe, we're passionate about empowering people to create beautiful and powerful digital experiences. We're on a mission to hire the very best and create exceptional employee experiences where everyone is respected and has access to equal opportunity.The OpportunityWe...


  • San Jose, California, United States Splunk Full time

    About SplunkSplunk is a leading provider of cloud-based data analytics and monitoring solutions. Our mission is to make machine data accessible, usable, and valuable to everyone.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our Cloud TechOps team. As a Site Reliability Engineer, you will be responsible for ensuring the...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Site Reliability Engineer, Data PlatformTikTok is a leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. Our platform is built to help imaginations thrive, and we're looking for a Site Reliability Engineer to join our Data Platform team.Responsibilities:Ensure the reliability of all TikTok's...


  • San Jose, California, United States Cisco Full time

    About the RoleCisco is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure. You will work closely with our development teams to identify and resolve issues, and collaborate with other teams to...


  • San Jose, California, United States Trianz Full time

    About TrianzTrianz is a leading-edge technology platforms and services company that accelerates digital transformations at Fortune 100 and emerging companies worldwide in data & analytics, digital experiences, cloud infrastructure, and security.Our VisionWe believe that companies around the world face three challenges in their digital transformation journeys...


  • San Jose, California, United States Altius Technologies Inc Full time

    Job DescriptionAt Altius Technologies Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for creating and supporting automation scripts for infrastructure deployments, validations, and monitoring to improve operational tasks.Key Responsibilities:Design and implement...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Cloud Site Reliability EngineerWe are seeking a highly skilled Cloud Site Reliability Engineer to join our team at TikTok. As a Cloud Site Reliability Engineer, you will be responsible for building, expanding, and operating Bytedance's global infrastructures, including large-scale systems in public and private clouds, data centers, and content...


  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly...


  • San Jose, California, United States Trianz Full time

    About TrianzTrianz is a leading-edge technology platforms and services company that accelerates digital transformations at Fortune 100 and emerging companies worldwide in data & analytics, digital experiences, cloud infrastructure, and security.Our VisionWe believe that companies around the world face three challenges in their digital transformation journeys...


  • San Jose, California, United States Western Digital Full time

    Job OverviewCompany OverviewAt Western Digital, we are driven by a vision to inspire global innovation and redefine technological possibilities. Our legacy as problem solvers has empowered us to achieve remarkable feats, including contributions to monumental projects like the moon landing.As a trusted partner to leading organizations worldwide, we enhance...


  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly...


  • San Jose, California, United States Diverse Lynx Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement automation scripts using shell,...


  • San Jose, California, United States Altius Technologies, Inc. Full time

    Job Title: Site Reliability EngineerAltius Technologies, Inc. is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the infrastructure and systems that support our business applications.Key Responsibilities:Design and implement automation...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Senior Site Reliability EngineerAt TikTok, we're committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe, and so does our workplace. As a Senior Site Reliability Engineer, you'll play a critical role in shaping the future of...


  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement automation scripts using...


  • San Jose, California, United States NetApp Full time

    Job SummaryAs a Site Reliability Engineer at NetApp, you will be responsible for managing, supporting, and maintaining a reliable environment for our cloud infrastructure. Your primary goal will be to ensure the stability and security of our cloud-based systems and platforms.Key ResponsibilitiesBuilding and supporting a reliable cloud environment to meet the...


  • San Jose, California, United States Tik Tok Full time

    {"title": "Recommendation Infrastructure Team", "subtitle": "Building the Future of TikTok's Recommendation System", "content": "At TikTok, we're committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe, and so does our workplace.We're...