Site Reliability Engineer, Data Infrastructure

2 weeks ago


San Jose, California, United States Tik Tok Full time
About the Role

We are seeking a highly skilled Site Reliability Engineer to join our Compute Platform team at TikTok. As a key member of our team, you will be responsible for ensuring the reliability and performance of our Big Data services and products.

Responsibilities
  • Ensure the reliability of all TikTok's major data warehouse products, services, and query engines, such as ClickHouse, Spark, Presto, and Doris.
  • Uphold Service Level Agreements (SLAs) and ensure that all service level objectives and agreements from ByteDance's Data Platform services are met.
  • Continuously analyze service performance and reliability patterns to identify potential performance bottlenecks and implement proactive measures to prevent service disruptions.
  • Lead efforts to troubleshoot and resolve service incidents and postmortems, coordinating with cross-functional teams to manage and mitigate service-impacting events.
  • Automate infrastructure provisioning, scaling, and management processes to reduce manual interventions and improve service quality.
  • Collaborate with product and development teams to integrate reliability and performance considerations into the software lifecycle.
Qualifications
  • Bachelor's Degree or above in Computer Science, Engineering, or a related field.
  • Indepth understanding of Linux, computer networking, and databases.
  • Proficient in common SRE/DevOps open-source toolsets, system monitoring tools, and container orchestration platforms like Kubernetes.
  • Experience or familiarity with open-source or commercial technologies such as ClickHouse, Hadoop, Doris, Spark, Presto, and Kubernetes.
  • Strong coding skills in at least one scripting or programming language, including but not limited to Python, Shell, Java, Go, etc.
  • Excellent problem-solving skills and the ability to think critically under pressure.
  • Strong written and verbal communication skills, with a great customer-first mindset.
About TikTok

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe, and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy.

We are passionate about this and hope you are too. TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs, or other reasons protected by applicable laws.



  • San Jose, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Compute Platform team at TikTok. As a key member of our team, you will be responsible for ensuring the reliability and performance of our Big Data services and products.ResponsibilitiesDesign and implement proactive measures to prevent service disruptions and ensure high...


  • San Jose, California, United States Tik Tok Full time

    About the RoleTikTok is seeking a highly skilled Site Reliability Engineer to join our Compute Platform SRE team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our major data warehouse products, services, and query engines.ResponsibilitiesEnsure the reliability of all TikTok's major data warehouse...


  • San Jose, California, United States Adobe Systems Inc Full time

    {"title": "Site Reliability Engineer", "description": "Transforming Digital ExperiencesAt Adobe, we're passionate about empowering people to create beautiful and powerful digital experiences. We're on a mission to hire the very best and create exceptional employee experiences where everyone is respected and has access to equal opportunity.The OpportunityWe...


  • San Jose, California, United States Splunk Full time

    About SplunkSplunk is a leading provider of cloud-based data analytics and monitoring solutions. Our mission is to make machine data accessible, usable, and valuable to everyone.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our Cloud TechOps team. As a Site Reliability Engineer, you will be responsible for ensuring the...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Site Reliability Engineer, Data PlatformTikTok is a leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. Our platform is built to help imaginations thrive, and we're looking for a Site Reliability Engineer to join our Data Platform team.Responsibilities:Ensure the reliability of all TikTok's...


  • San Jose, California, United States Cisco Full time

    About the RoleCisco is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure. You will work closely with our development teams to identify and resolve issues, and collaborate with other teams to...


  • San Jose, California, United States Trianz Full time

    About TrianzTrianz is a leading-edge technology platforms and services company that accelerates digital transformations at Fortune 100 and emerging companies worldwide in data & analytics, digital experiences, cloud infrastructure, and security.Our VisionWe believe that companies around the world face three challenges in their digital transformation journeys...


  • San Jose, California, United States Altius Technologies Inc Full time

    Job DescriptionAt Altius Technologies Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for creating and supporting automation scripts for infrastructure deployments, validations, and monitoring to improve operational tasks.Key Responsibilities:Design and implement...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Cloud Site Reliability EngineerWe are seeking a highly skilled Cloud Site Reliability Engineer to join our team at TikTok. As a Cloud Site Reliability Engineer, you will be responsible for building, expanding, and operating Bytedance's global infrastructures, including large-scale systems in public and private clouds, data centers, and content...


  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly...


  • San Jose, California, United States Trianz Full time

    About TrianzTrianz is a leading-edge technology platforms and services company that accelerates digital transformations at Fortune 100 and emerging companies worldwide in data & analytics, digital experiences, cloud infrastructure, and security.Our VisionWe believe that companies around the world face three challenges in their digital transformation journeys...


  • San Jose, California, United States Western Digital Full time

    Job OverviewCompany OverviewAt Western Digital, we are driven by a vision to inspire global innovation and redefine technological possibilities. Our legacy as problem solvers has empowered us to achieve remarkable feats, including contributions to monumental projects like the moon landing.As a trusted partner to leading organizations worldwide, we enhance...


  • San Jose, California, United States Diverse Lynx Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement automation scripts using shell,...


  • San Jose, California, United States Altius Technologies, Inc. Full time

    Job Title: Site Reliability EngineerAltius Technologies, Inc. is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the infrastructure and systems that support our business applications.Key Responsibilities:Design and implement automation...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Senior Site Reliability EngineerAt TikTok, we're committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe, and so does our workplace. As a Senior Site Reliability Engineer, you'll play a critical role in shaping the future of...


  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement automation scripts using...


  • San Jose, California, United States Western Digital Full time

    Job OverviewCompany Overview:At Western Digital, we are dedicated to driving global innovation and redefining the limits of technology, transforming what was once deemed impossible into reality.As a company rooted in problem-solving, we empower individuals to achieve remarkable feats through advanced technology. Our innovations have played a pivotal role in...


  • San Diego, California, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Analytics team at Apple. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our data analytics applications and infrastructure.Key ResponsibilitiesDesign, develop, and maintain complex data infrastructure at the...


  • San Jose, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our dynamic team at TikTok. As a pioneer in innovation, our data infrastructure SRE team seamlessly merges software development and infrastructure operations to design, build, and manage large-scale, highly distributed systems.Key ResponsibilitiesParticipate in and enhance the...


  • San Jose, California, United States Hireio, Inc. Full time

    About the RoleHireio, Inc. is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our data infrastructure team, you will be responsible for designing, building, and managing large-scale, highly distributed systems.Our team is a pioneer in innovation, seamlessly merging software development and infrastructure...