Site Reliability Engineer

2 weeks ago


Seattle, United States Tik Tok Full time

About TikTok U.S. Data Security TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security (“USDS”) is a subsidiary of TikTok in the U.S. This new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep U.S. users safe. Our focus is on providing oversight and protection of the TikTok platform and U.S. user data, so millions of Americans can continue turning to TikTok to learn something new, earn a living, express themselves creatively, or be entertained. The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more. Why Join Us Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible. Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day. To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always. At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve. Join us. Site Reliability Engineering (SRE) of the IC (Intelligence Creation) team empowers content creation and interaction via visual intelligence and artificial intelligence for the United States and all around the world. The opportunity to work with product teams on the latest AI Generative Content, Intelligent Editing, and Content Understanding technologies. Join us and you'll be able to shape the future of IC systems and make a real, tangible impact on TikTok Users. About the Role: The Intelligent Creation SRE Team is seeking an experienced Site Reliability Engineer to help us continue improving TikTok's content creation platform. If you are passionate about ensuring software reliability, love problem-solving, and are prepared for exciting challenges, we would like you on our team. Responsibilities: - Provide site reliability engineering support to deploy and maintain the content creation platform, including training, inference, and pipeline orchestration in the production environment under the guidance of Senior-level SREs. - Continuously integrate and deploy our services to the cloud environment, ensuring optimal performance and reliability. - Develop and maintain software while looking into performance bottlenecks and debugging software issues. - Engage in service capacity planning and demand forecasting, software performance analysis, and system tuning. - Assist the team in managing frameworks for efficient, automated, and intelligent service-oriented architecture (SOA) governance. - Monitor health and performance of 100+ microservices that power TikTok's content creation platform; intervene as needed to rectify outages or issues.Basic: - A minimum of 2 years previous experience as an SRE or similar software engineering role. - Ability to write clean, maintainable code, with proficiency in languages like Python, Java, or Go. - Extensive experience working within a cloud environment, with tools such as AWS, GCP, or Azure. - Strong understanding of software development and cloud architecture best practices. - Experience supporting microservices at scale; familiarity with observability tools desirable. - Excellent problem-solving skills and an ability to manage complex tasks efficiently. - Good communication skills for effective collaboration within the team and external departments. Preferred: - Prior experience using tools like Kubernetes, Docker, Prometheus, or other similar technologies. - Knowledge of or experience in DevOps methodologies and continuous integration/continuous deployment (CI/CD) processes. - Familiarity with network protocols, security, and DNS. - Certifications from recognized bodies in relevant fields, e.g. Google Certified Professional – Cloud Architect, AWS Certified DevOps Engineer. - Knowledge of Machine Learning and AI concepts could be advantageous. TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too. TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at usds.accommodations@tiktokusds.com This role requires the ability to work with and support systems designed to protect sensitive data and information. As such, this role will be subject to strict national security-related screening.



  • Seattle, United States Sogeti Full time

    Site Reliability Engineer FTE with benefits Our team is looking to add experienced Site Reliability / DevOps Engineer to our team. Experienced with Python and Shell Scripting.Should have extensive experience with Azure or AWS (Azure preferred)Experience with Monitoring and Observability - DatadogExperience with Infrastructure as a Code - specifically...


  • Seattle, United States Sogeti Full time

    Site Reliability Engineer FTE with benefits Our team is looking to add experienced Site Reliability / DevOps Engineer to our team. Experienced with Python and Shell Scripting.Should have extensive experience with Azure or AWS (Azure preferred)Experience with Monitoring and Observability - DatadogExperience with Infrastructure as a Code - specifically...


  • Seattle, United States Capgemini Full time

    Site Reliability Engineer FTE with benefits Our team is looking to add experienced Site Reliability / DevOps Engineer to our team. Experiencedwith Python and Shell Scripting. Shouldhave extensive experience with Azure or AWS (Azure preferred) Experiencewith Monitoring and Observability - Datadog Experiencewith Infrastructure as a Code - specifically...


  • Seattle, United States Capgemini Full time

    Site Reliability Engineer FTE with benefits Our team is looking to add experienced Site Reliability / DevOps Engineer to our team. Experiencedwith Python and Shell Scripting. Shouldhave extensive experience with Azure or AWS (Azure preferred) Experiencewith Monitoring and Observability - Datadog Experiencewith Infrastructure as a Code - specifically...


  • Seattle, United States Capgemini Full time

    **Site Reliability Engineer** **FTE with benefits** Our team is looking to add experienced Site Reliability / DevOps Engineer to our team. + Experiencedwith **Python and Shell Scripting.** + **Shouldhave extensive experience with Azure or AWS (Azure preferred)** + **Experiencewith Monitoring and Observability - Datadog** + **Experiencewith Infrastructure as...


  • Seattle, United States Capgemini Full time

    **Site Reliability Engineer** **FTE with benefits** Our team is looking to add experienced Site Reliability / DevOps Engineer to our team. + Experiencedwith **Python and Shell Scripting.** + **Shouldhave extensive experience with Azure or AWS (Azure preferred)** + **Experiencewith Monitoring and Observability - Datadog** + **Experiencewith Infrastructure as...


  • Seattle, United States Capgemini Full time

    **Site Reliability Engineer** **FTE with benefits** Our team is looking to add experienced Site Reliability / DevOps Engineer to our team. + Experiencedwith **Python and Shell Scripting.** + **Shouldhave extensive experience with Azure or AWS (Azure preferred)** + **Experiencewith Monitoring and Observability - Datadog** + **Experiencewith Infrastructure as...


  • Seattle, United States Axon Full time

    Join Axon and be a Force for Good. At Axon, we're on a mission to Protect Life. We're explorers, pursuing society's most critical safety and justice issues with our ecosystem of devices and cloud software. Like our products, we work better together. We connect with candor and care, seeking out diverse perspectives from our customers, communities and each...


  • Seattle, United States Sogeti Full time

    Lead Site Reliability Engineer Seattle, WA FTE/ Direct hiring with benefits No Remote - Onsite and Hybrid position from WA location only Qualification & Skills 8+ years of experience in Site Reliability Engineering or related field Develop, maintain and configure cloud observability systems (e.g., Datadog, Splunk, OpenTelemetry, APM, etc.). Build flexible...


  • Seattle, United States Saxon Global Full time

    Starbucks Senior Site Reliability Engineer (Cloud) 8-month contract (Likely extension to 18 month with strong performance) Hybrid - (Must be local to the Seattle area, onsite at Starbucks headquarters 3 days a week with 2 days remote) Job Summary and Mission This position contributes to Starbucks on their Data Platform Services team. This team maintains and...


  • Seattle, Washington, United States TikTok Full time

    Site Reliability Engineer - Video Platform - USDS (SEA)SeattleRegularR&DJob ID: A197783ResponsibilitiesAbout TikTok U.S. Data Security TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security ("USDS") is a subsidiary of TikTok in the U.S. This new, security-first division was...


  • Seattle, Washington, United States TikTok Full time

    Site Reliability Engineer - Video Platform - USDS (SEA)SeattleRegularR&DJob ID: A197783ResponsibilitiesAbout TikTok U.S. Data Security TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security ("USDS") is a subsidiary of TikTok in the U.S. This new, security-first division was...


  • Seattle, United States Sogeti Full time

    Lead Site Reliability Engineer Seattle, WA FTE/ Direct hiring with benefits No Remote - Onsite and Hybrid position from WA location only Qualification & Skills 8+ years of experience in Site Reliability Engineering or related field Develop, maintain and configure cloud observability systems (e.g., Datadog, Splunk, OpenTelemetry, APM, etc.). ...


  • Seattle, United States Sogeti Full time

    Lead Site Reliability Engineer Seattle, WA FTE/ Direct hiring with benefits No Remote - Onsite and Hybrid position from WA location only Qualification & Skills 8+ years of experience in Site Reliability Engineering or related field Develop, maintain and configure cloud observability systems (e.g., Datadog, Splunk, OpenTelemetry, APM, etc.). ...


  • Seattle, United States Saxon Global Full time

    Starbucks Senior Site Reliability Engineer (Cloud) 8-month contract (Likely extension to 18 month with strong performance) Hybrid - (Must be local to the Seattle area, onsite at Starbucks headquarters 3 days a week with 2 days remote) Job Summary and Mission This position contributes to Starbucks on their Data Platform Services team. This team maintains...


  • Seattle, United States Saxon Global Full time

    Starbucks Senior Site Reliability Engineer (Cloud) 8-month contract (Likely extension to 18 month with strong performance) Hybrid - (Must be local to the Seattle area, onsite at Starbucks headquarters 3 days a week with 2 days remote) Job Summary and Mission This position contributes to Starbucks on their Data Platform Services team. This team maintains...


  • Seattle, United States INSPYR Solutions Full time

    Title: Site Reliability Engineer Location: Seattle, WA (Hybrid 2-3 days on-site) Duration: 1+ year contract, (Possibility of conversion) Compensation: $85-$95.40/hour Work Requirements: US Citizen, GC Holders or Authorized to Work in the U.S. Skillset / Experience: You will be taking a lead role, interacting with a squad of experienced AWS software...


  • Seattle, United States SingleStore Full time

    Position Overview MemSQL is seeking a Senior Site Reliability Engineer to help drive our Kubernetes product strategy surrounding our managed service. You will be at the forefront; crafting the design, building out the collaborated vision, and sustaining your envisioned product strategy. This role will be an integral part of building our managed service...


  • Seattle, United States F5 Networks Full time

    At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation. Everything we do centers around...


  • Seattle, United States F5 Networks Full time

    At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation. Everything we do centers around...