Current jobs related to Cloud Reliability Operations Engineer - Seattle, Washington - Tik Tok


  • Seattle, Washington, United States Sogeti Full time

    Job Title: Cloud Reliability EngineerWe are seeking a highly skilled Cloud Reliability Engineer to join our team at Sogeti. As a Cloud Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our software systems and infrastructure.Key Responsibilities:Develop, maintain, and configure cloud observability...


  • Seattle, Washington, United States Elit IT Inc. Full time

    Job Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Elit IT Inc. in Seattle, WA. As a key member of our cloud operations team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities:Design, implement, and...


  • Seattle, Washington, United States Axon Full time

    About AxonAxon is a company on a mission to Protect Life. We're a team of explorers, working together to address society's most critical safety and justice issues with our ecosystem of devices and cloud software.Job SummaryWe're seeking a Senior Cloud Reliability Engineer to join our APX SRE organization. As a key member of our team, you'll be responsible...


  • Seattle, Washington, United States Elit IT Inc. Full time

    Sr. SRE (Site Reliability Engineer) - Data DevOps/ DataOps/ No-SQLElit IT Inc. is seeking a highly skilled Senior Site Reliability Engineer to join our team in Seattle, WA. As a key member of our Data DevOps team, you will be responsible for ensuring the reliability and scalability of our cloud-based infrastructure.Key Responsibilities:Design and implement...


  • Seattle, Washington, United States Apple Full time

    Role SummaryAs a Senior Site Reliability Engineer at Apple, you will play a critical role in ensuring the reliability and scalability of our cloud services. You will be responsible for designing, implementing, and operating the infrastructure that supports our mission-critical cloud systems. This includes monitoring, troubleshooting, and resolving issues to...


  • Seattle, Washington, United States Natsoft Full time

    Sr. SRE (Site Reliability Engineer) RoleWe are seeking a highly skilled Sr. SRE (Site Reliability Engineer) to join our team at Natsoft.Location: Seattle, WA (Hybrid)Duration: 10-12 monthsKey Responsibilities:Design and implement scalable and reliable cloud infrastructure using Azure Cloud, AKS, and Terraform.Develop and maintain Databricks Notebooks and...


  • Seattle, Washington, United States Apple Full time

    Job DescriptionAs a Senior Site Reliability Engineer for Object Storage at Apple, you will play a critical role in ensuring the reliability and scalability of our cloud infrastructure. Your expertise will be instrumental in designing, implementing, and maintaining high-performance systems that meet the demands of our global user base.Key...


  • Seattle, Washington, United States Apple Full time

    Job SummaryApple is seeking a highly skilled Senior Site Reliability Engineer to join our Object Storage team. As a key member of our team, you will be responsible for designing, implementing, and maintaining our cloud-based object storage infrastructure.Key ResponsibilitiesDesign and implement scalable and highly available cloud-based object storage...


  • Seattle, Washington, United States Apple Full time

    Job SummaryWe are seeking a highly skilled Senior Site Reliability Engineer to join our Object Storage team. As a key member of our engineering team, you will be responsible for designing, implementing, and maintaining the reliability and scalability of our object storage services.Key ResponsibilitiesDesign and implement monitoring and automation tools to...


  • Seattle, Washington, United States Apple Full time

    Senior Site Reliability EngineerImagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish.This is a hands-on role to establish SRE practices for a private cloud service to accelerate our...


  • Seattle, Washington, United States SingleStore Full time

    Position OverviewSingleStore is seeking a Senior Site Reliability Engineer to drive our Kubernetes product strategy. This role will be at the forefront of crafting the design, building out the collaborated vision, and sustaining the envisioned product strategy.Key ResponsibilitiesHelp SingleStore craft its production container orchestration strategy.Design,...


  • Seattle, Washington, United States Saxon Global Full time

    Job SummaryWe are seeking a highly skilled Senior Site Reliability Engineer to join our team. As a Senior Site Reliability Engineer, you will be responsible for ensuring the health and performance of our cloud-based systems. You will work closely with our development team to design, implement, and maintain scalable and reliable cloud infrastructure.Key...


  • Seattle, Washington, United States Apple Full time

    Senior Site Reliability EngineerImagine what you could do here. At Apple, great ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish.This is a hands-on role to establish SRE practices for a private cloud service to accelerate our...


  • Seattle, Washington, United States Sogeti Full time

    Site Reliability Engineer **Job Summary** We are seeking an experienced Site Reliability Engineer to join our team. As a key member of our operations team, you will be responsible for ensuring the reliability and scalability of our cloud-based infrastructure. **Key Responsibilities** * Design, implement, and maintain scalable and reliable cloud...


  • Seattle, Washington, United States Hulu Full time

    Job SummaryOur Performance and Reliability teams are leading the improvements, optimization, and availability of applications across the Disney organization and business units, taking a consultative approach to Reliability Engineering by supporting, educating, mentoring, and delivering automation to foster performance and resiliency in best...


  • Seattle, Washington, United States Phaidra Full time

    About PhaidraPhaidra is a pioneering company in the industrial automation sector, leveraging AI-powered control systems to enable facilities to adapt and improve over time.Our mission is to revolutionize the way industrial facilities operate, making them more efficient, sustainable, and responsive to their environment.Job DescriptionWe are seeking a highly...


  • Seattle, Washington, United States Apple Full time

    Job SummaryAs a Site Reliability Engineering Manager at Apple, you will lead a team responsible for providing the platform for mission-critical cloud systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish. This is a hands-on role to establish SRE practices for a private cloud service to accelerate our...


  • Seattle, Washington, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Trust team at TikTok U.S. Data Security. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the reliability and scalability of our cloud infrastructure. You will work closely with our engineering teams to ensure the smooth...


  • Seattle, Washington, United States Sogeti Full time

    Job Title: Site Reliability EngineerAbout the Role:We are seeking an experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud-based infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using Azure or...


  • Seattle, Washington, United States Oracle Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Oracle. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure. You will work closely with our development teams to design, implement, and operate large-scale distributed...

Cloud Reliability Operations Engineer

2 months ago


Seattle, Washington, United States Tik Tok Full time

About TikTok
TikTok is a premier platform for short-form mobile video, dedicated to fostering creativity and spreading joy. With a global presence, we have offices in various major cities worldwide.

Why Join Us
At TikTok, creation is at the heart of our mission. Our platform is designed to empower creativity, and this ethos extends to our teams. We believe in tackling challenges as opportunities for learning, innovation, and collective growth.

Infrastructure Engineering Team
Our Infrastructure Engineering team plays a crucial role in supporting TikTok's rapid expansion by constructing and managing hyper-scale data centers, overseeing server fleet life cycles, delivering cloud solutions, and developing robust infrastructure services that ensure scalability and reliability.

Key Responsibilities

  • Enhance and manage Bytedance's global infrastructure, which includes extensive systems across public and private clouds, data centers, and content delivery networks.
  • Develop tools, automation, visualizations, and monitoring systems to optimize global infrastructure operations.
  • Engage in technical operations and rotations to address performance and reliability challenges.
  • Contribute to the entire lifecycle of infrastructure services, from initial design through development to deployment and ongoing support.

Qualifications
Applicants should possess a Master's degree (or a Bachelor's degree with a minimum of 3 years of relevant experience) in Computer Engineering, Electrical Engineering, Computer Science, or a related field. Candidates should also have:

  • 3+ years of experience with Unix/Linux systems, including system libraries, file systems, and client-server protocols.
  • 3+ years of experience with essential system-level applications such as DNS, APT, LDAP, Nginx, CI/CD, Ansible, and Packer.
  • 2+ years of proficiency in programming languages like Java, C++, Go, or scripting in Shell and Python.

Desired Skills
We value self-motivated individuals who can navigate ambiguity and drive projects from concept to execution. Strong analytical skills and the ability to solve complex problems in a dynamic environment are essential. Experience in:

  • Designing and building automation tools for large-scale systems.
  • Creating solutions utilizing AWS, Google Cloud, OCI, and other cloud services.
  • Effective communication and collaboration.

Preferred Qualifications
Familiarity with:

  • Kubernetes and microservices architecture.
  • Web application design and implementation.
  • Database design and administration.
  • Unit, integration, and performance testing.
  • System and data security practices.

Commitment to Diversity
TikTok is dedicated to fostering an inclusive environment where all employees are recognized for their unique skills, experiences, and perspectives. We are passionate about creating a workplace that reflects the diverse communities we serve.