Site Reliability Engineer, Data Platform

1 week ago


San Jose, California, United States Tik Tok Full time
Job Title: Site Reliability Engineer, Data Platform

TikTok is a leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. Our platform is built to help imaginations thrive, and we're looking for a Site Reliability Engineer to join our Data Platform team.

Responsibilities:
  • Ensure the reliability of all TikTok's major data warehouse products, services, and query engines, such as ClickHouse, Spark, Presto, and Doris.
  • Uphold Service Level Agreements (SLAs) and ensure that all service level objectives and agreements from ByteDance's Data Platform services are met.
  • Continuously analyze service performance and reliability patterns to identify potential performance bottlenecks and implement proactive measures to prevent service disruptions.
  • Lead efforts to troubleshoot and resolve service incidents and postmortems, coordinating with cross-functional teams to manage and mitigate service-impacting events.
  • Automate infrastructure provisioning, scaling, and management processes to reduce manual interventions and improve service quality.
  • Collaborate with product and development teams to integrate reliability and performance considerations into the software lifecycle.
Requirements:
  • Bachelor's Degree or above in Computer Science, Engineering, or a related field.
  • Indepth understanding of Linux, computer networking, and databases.
  • Proficient in common SRE/DevOps open-source toolsets, system monitoring tools, and container orchestration platforms like Kubernetes.
  • Experience or familiarity with open-source or commercial technologies such as ClickHouse, Hadoop, Doris, Spark, Presto, and Kubernetes.
  • Strong coding skills in at least one scripting or programming language, including but not limited to Python, Shell, Java, Go, etc.
  • Excellent problem-solving skills and the ability to think critically under pressure.
  • Strong written and verbal communication skills, with a great customer-first mindset.

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe, and so does our workplace. We're passionate about this and hope you are too.



  • San Jose, California, United States Tik Tok Full time

    About the RoleTikTok is seeking a highly skilled Site Reliability Engineer to join our Compute Platform SRE team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our major data warehouse products, services, and query engines.ResponsibilitiesEnsure the reliability of all TikTok's major data warehouse...


  • San Jose, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Compute Platform team at TikTok. As a key member of our team, you will be responsible for ensuring the reliability and performance of our Big Data services and products.ResponsibilitiesDesign and implement proactive measures to prevent service disruptions and ensure high...


  • San Jose, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Compute Platform team at TikTok. As a key member of our team, you will be responsible for ensuring the reliability and performance of our Big Data services and products.ResponsibilitiesEnsure the reliability of all TikTok's major data warehouse products, services, and query...


  • San Jose, California, United States Trianz Full time

    About TrianzTrianz is a leading-edge technology platforms and services company that accelerates digital transformations at Fortune 100 and emerging companies worldwide in data & analytics, digital experiences, cloud infrastructure, and security.Our VisionWe believe that companies around the world face three challenges in their digital transformation journeys...


  • San Jose, California, United States Trianz Full time

    About TrianzTrianz is a leading-edge technology platforms and services company that accelerates digital transformations at Fortune 100 and emerging companies worldwide in data & analytics, digital experiences, cloud infrastructure, and security.Our VisionWe believe that companies around the world face three challenges in their digital transformation journeys...


  • San Diego, California, United States Platform Science Full time

    About UsAt Platform Science, we're revolutionizing the way businesses connect and interact with the world around them. Our open IoT platform empowers innovative fleets, application developers, and equipment providers to deliver cutting-edge solutions to supply chain professionals globally.The RoleWe're seeking a highly skilled Senior Site Reliability...


  • San Diego, California, United States Platform Science Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team in San Diego, CA (or remote). As a key member of our SRE team, you will be responsible for ensuring the reliability and performance of our cloud-based platform.Key ResponsibilitiesDevelop and enhance CI/CD pipelines to streamline application deployment and...


  • San Jose, California, United States Adobe Full time

    About the RoleWe are seeking an exceptional Site Reliability Engineering Manager to lead our team in driving reliability for Adobe's AI Inference Platform, Adobe Firefly. As a key member of our Engineering organization, you will be responsible for developing a team of Site Reliability Engineers who will work closely with our Engineering teams to build,...


  • San Jose, California, United States Adobe Full time

    About the RoleWe are seeking an exceptional Site Reliability Engineering Manager to lead our team in driving reliability for Adobe's AI Inference Platform, Adobe Firefly. As a key member of our Engineering organization, you will be responsible for developing a team of Site Reliability Engineers who will work closely with our Engineering teams to build,...


  • San Jose, California, United States Adobe Full time

    About the RoleWe're seeking an exceptional Site Reliability Engineering Manager to lead our AI Platform Inference Infrastructure team at Adobe. As a key member of our organization, you'll be responsible for driving reliability, scalability, and security for our AI Inference Platform, Adobe Firefly.Key ResponsibilitiesDevelop and execute the technical vision...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Site Reliability Engineer, Cloud Native PlatformTikTok is a leading destination for short-form mobile video, inspiring creativity and bringing joy to users worldwide. Our mission is to connect people across the globe, and our infrastructure team is seeking experienced site reliability engineers to build a globally distributed edge platform for...


  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly...


  • San Jose, California, United States Altius Technologies Inc Full time

    Job DescriptionAt Altius Technologies Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for creating and supporting automation scripts for infrastructure deployments, validations, and monitoring to improve operational tasks.Key Responsibilities:Design and implement...


  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement automation scripts using...


  • San Jose, California, United States Zscaler Full time

    About ZscalerZscaler is a leading cloud security company that accelerates digital transformation for its customers. With a cloud-native platform, Zscaler protects thousands of organizations from cyber threats and data loss by securely connecting users, devices, and applications worldwide.As a pioneer in cloud security, Zscaler has over 10 years of experience...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Cloud Site Reliability EngineerWe are seeking a highly skilled Cloud Site Reliability Engineer to join our team at TikTok. As a Cloud Site Reliability Engineer, you will be responsible for building, expanding, and operating Bytedance's global infrastructures, including large-scale systems in public and private clouds, data centers, and content...


  • San Jose, California, United States NetApp Full time

    Job SummaryAs a Site Reliability Engineer at NetApp, you will be responsible for managing, supporting, and maintaining a reliable environment for our site. This involves ensuring the stability and security of multiple open-source systems and platforms that are run or operated in that environment.Key ResponsibilitiesBuilding and supporting a reliable site for...


  • San Jose, California, United States Adobe Full time

    About the RoleWe are seeking an exceptional Site Reliability Engineer to join our team at Adobe, working on the AI Training Platform, Adobe Firefly. As a key member of our team, you will collaborate closely with Engineering teams to build, scale, and secure the AI Platform, enabling Firefly product teams to easily manage and deploy Machine Learning...


  • San Jose, California, United States Altius Technologies, Inc. Full time

    Job Title: Site Reliability EngineerAltius Technologies, Inc. is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the infrastructure and systems that support our business applications.Key Responsibilities:Design and implement automation...


  • San Jose, California, United States Diverse Lynx Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement automation scripts using shell,...