Site Reliability Engineer
1 week ago
We are seeking a highly skilled Site Reliability Engineer to join our Infrastructure and Assurance Services team at TikTok. As a key member of our team, you will be responsible for ensuring the operability, observability, and automation of our infrastructure, providing holistic insights and solutions to minimize manual interventions.
Responsibilities- Perform SRE duties and operations on supported services in production, including on-call rotations, maintenance, change management, monitoring, incident response, capacity planning, and disaster recovery.
- Maximize system uptime, availability, and stability to ensure functional and performance SLAs.
- Contribute to existing documentations and build effective documentations such as operational runbooks, SOPs, SLA/SLO.
- Initiate and lead scripting/tooling/automation to streamline processes and minimize human resource.
- Work cross-functionally and regionally with SRE/Dev/QA/PM teams to handle incidents and improve processes.
- Manage and prioritize tasks/projects for high productivity and precise deliveries.
- Bachelor's degree in Computer Science, a related field, or equivalent practical experience.
- Demonstrated experience in software development with one or more programming languages.
- Experience in Linux Operating Systems, Networking, Database concepts, Monitoring, Shell scripting.
- Superb analytical ability, problem-solving, and critical thinking skills.
- Excellent communicator, team-player, self-starter, and fast learner.
- Master's degree in Computer Science, Engineering, or a related field.
- Proficient in any of the following languages: Python, GoLang, C++.
- Expertise in any of the following: SRE philosophy, AIOPS, APM, Disaster Recovery.
- Expertise in any of these tech stacks: Kubernetes, ElasticSearch, ClickHouse, Message Queue, OpenTSDB, Service Mesh.
TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe, and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. We are passionate about this and hope you are too.
TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs, or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us.
-
Site Reliability Engineer
3 weeks ago
Mountain View, California, United States Optomi Full timeJob Title: Site Reliability EngineerOptomi, in partnership with a large consulting firm, is seeking an experienced Site Reliability Engineer for their Remote team. This position requires a versatile, highly motivated individual capable of supplying frontline technical and operational support to our Site Reliability teams.As a vital part of the Reliability...
-
Site Reliability Engineer
2 days ago
Mountain View, California, United States Moveworks Full timeAbout MoveworksMoveworks is a leading AI startup that provides a universal AI copilot for search and automation across all business applications. Our mission is to empower employees to work faster and more efficiently by eliminating repetitive support issues and delivering instant knowledge.Job DescriptionWe are seeking a highly skilled Site Reliability...
-
Site Reliability Engineer
21 hours ago
Mountain View, California, United States Moveworks Full timeAbout MoveworksMoveworks is a leading AI-powered automation platform that helps businesses streamline their operations and improve employee productivity. Our innovative technology enables employees to find information and get support in one place, reducing costs and increasing efficiency.Job DescriptionWe are seeking a highly skilled Site Reliability...
-
Site Reliability Engineer
16 hours ago
Mountain View, California, United States Atlassian Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.ResponsibilitiesDesign, implement, and maintain scalable and reliable cloud infrastructureCollaborate with...
-
Site Reliability Engineer
1 week ago
Mountain View, California, United States Moveworks Full timeAbout MoveworksMoveworks is a leading AI startup that provides a universal AI copilot for search and automation across all business applications. Our mission is to empower employees to work faster and more efficiently by eliminating repetitive support issues and delivering instant knowledge.Job DescriptionWe are seeking a highly skilled Site Reliability...
-
Site Reliability Engineer
1 month ago
Mountain View, California, United States Optomi Full timeOptomi's Site Reliability Engineer OpportunityWe are seeking a skilled Site Reliability Engineer to join our team at Optomi, in partnership with a large consulting firm. This role requires a versatile and highly motivated individual who can provide frontline technical and operational support to our Site Reliability teams.Key Responsibilities:Collaborate with...
-
Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Tik Tok Full timeAbout the RoleWe are seeking a skilled Site Reliability Engineer to join our Applied Machine Learning (AML) team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining highly available, scalable, and fault-tolerant systems.ResponsibilitiesDesign and develop large-scale systems that meet the needs of our AML...
-
Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Synopsys Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Platform Team at Synopsys. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our engineering environment. You will work closely with our development teams to design, implement, and operate scalable and efficient...
-
Principal Site Reliability Engineer
4 weeks ago
Mountain View, California, United States Groq Full timeJob Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...
-
Principal Site Reliability Engineer
3 weeks ago
Mountain View, California, United States Groq Full timeJob Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...
-
Principal Site Reliability Engineer
4 weeks ago
Mountain View, California, United States Groq Full timeJob Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...
-
Site Reliability Engineer, Edge
1 week ago
Mountain View, California, United States Tik Tok Full timeJob Title: Site Reliability Engineer, EdgeAt TikTok, we're committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe, and so does our workplace.About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our Edge team. As a...
-
Site Reliability Engineer
3 weeks ago
Mountain View, California, United States Tik Tok Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our AML team, where you will play a critical role in designing, building, and maintaining highly available, scalable, and fault-tolerant systems.ResponsibilitiesDesign and develop large-scale systems that meet the needs of our users.Monitor and analyze system performance,...
-
Senior Site Reliability Engineer
2 days ago
Mountain View, California, United States Groq Full timeJob Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Groq. As a key member of our infrastructure operations team, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services.Key Responsibilities:Design and implement scalable and...
-
Senior Site Reliability Engineer
4 hours ago
Mountain View, California, United States Groq Full timeUnlock the Power of AI with GroqWe're on a mission to democratize access to AI, and we need your expertise to make it happen. As a Senior Site Reliability Engineer at Groq, you'll play a critical role in ensuring the reliability, scalability, and performance of our tools and services.Key Responsibilities:Design and implement scalable and reliable...
-
Staff Site Reliability Engineer
3 weeks ago
Mountain View, California, United States Moveworks Full timeAbout MoveworksMoveworks is a leading AI startup that provides a universal AI copilot for search and automation across all business applications. Our mission is to empower employees to work faster and more efficiently by eliminating repetitive support issues and delivering instant knowledge.Job DescriptionWe are seeking a highly skilled Staff Site...
-
Site Reliability Engineer
2 days ago
Mountain View, California, United States Tik Tok Full timeAbout the RoleWe are seeking a skilled Site Reliability Engineer to join our AML team, where you will play a critical role in designing, building, and maintaining highly available, scalable, and fault-tolerant systems.ResponsibilitiesDesign and implement large-scale systems to ensure high availability and scalability.Monitor and analyze system performance,...
-
Site Reliability Engineer
1 day ago
Mountain View, California, United States Tik Tok Full timeAbout TikTok U.S. Data SecurityTikTok is a leading destination for short-form mobile video, inspiring creativity and bringing joy to millions of users worldwide. Our mission is to empower creators and communities to express themselves authentically, while ensuring the security and integrity of our platform.Job SummaryWe are seeking a highly skilled Site...
-
Senior Site Reliability Engineer
15 hours ago
Mountain View, California, United States Groq Full timeUnlock the Power of AI with GroqAt Groq, we're revolutionizing the AI economy by making processing power more accessible, faster, and more affordable. Our Language Processing Unit (LPU) outpaces the GPU in speed, power, efficiency, and cost-effectiveness, empowering a world where AI is universally accessible.Join Our MissionWe're seeking a Senior Site...
-
Site Reliability Engineer
3 weeks ago
Mountain View, California, United States Bayone Full timeJob DescriptionAt Bayone, we are seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for ensuring the high availability and scalability of our online production environment.Minimum Qualifications:Bachelor's degree in Computer Science or a related technical field, or...