Reliability Engineer
2 weeks ago
TikTok is a premier platform for short-form mobile video, dedicated to fostering creativity and delivering joy. Our Trust and Safety engineering division is rapidly expanding, focusing on developing machine learning models and systems aimed at identifying and mitigating internet abuse and fraud across our platform.
Our objective is to safeguard billions of users and publishers globally on a daily basis. We leverage cutting-edge machine learning technologies to enhance our trust and safety systems, utilizing the vast amounts of data generated on our platform.
Through the relentless efforts of our team, TikTok strives to provide an exceptional user experience, spreading joy worldwide.
In this role, you will tackle complex scalability challenges while applying your expertise in coding, algorithms, complexity analysis, and large-scale system design.
Key Responsibilities:
- Oversee daily operations of data services and real-time/batch data pipelines, including SLA management, system deployment, performance optimization, and troubleshooting.
- Develop tools and automation to enhance system administration and operational efficiency.
- Participate in regular on-call duties.
- Contribute to and refine the entire lifecycle of services from inception and design through development, capacity planning, launch reviews, deployment, operation, and refinement.
- Design and implement software platforms and monitoring frameworks for effective, automated, and intelligent service-oriented architecture (SOA) governance.
- Ensure sustainable system scalability through automation; enhance system reliability, efficiency, and velocity by advocating for necessary changes.
- Engage in sustainable user support, incident response, and conduct blameless postmortems.
Qualifications:
- Bachelor's degree in Computer Science or a related field, with a minimum of 3 years of relevant experience.
- Proven independent thinking and troubleshooting abilities.
- Proficiency in programming languages such as Python, Go, C, C++, Java, or Rust.
- Familiarity with backend systems including MySQL, Redis, Nginx, Kafka, Kubernetes, Docker, and big data technologies like Hadoop, Spark, Flink, Hive, OLAP, ClickHouse, etc.
- Understanding of Unix/Linux system internals, networking, and distributed systems.
- Strong communication and coordination skills.
- Experience in Trust & Safety is advantageous.
TikTok is dedicated to fostering an inclusive environment where employees are recognized for their skills, experiences, and unique perspectives. Our platform connects individuals globally, and we aim to reflect this diversity within our workplace. We believe that all individuals should be evaluated based on their strengths and experiences, free from bias related to background or identity.
TikTok is committed to providing reasonable accommodations during our recruitment process.
-
Lead Reliability Engineer
2 months ago
Mountain View, California, United States CUSHMAN Full timeJob TitleLead Reliability EngineerJob Description SummaryThe Lead Facilities Reliability Engineer will develop, implement and track facilities reliability and maintenance engineering programs at client site with a focus on performing facilities condition assessments and maintaining the facilities condition assessment database. Utilizing plant...
-
Embedded Site Reliability Engineer
2 months ago
Mountain View, California, United States Samsung Full timeEmbedded Site Reliability Engineer (Samsung Ads)remote typeHybridlocations645 Clyde Avenue, Mountain View, CA, USAOne Pennsylvania Plaza, 26th Floor, New York, NY, USAtime typeFull timejob requisition idR84565Position SummaryIn recent years, Samsung has transformed its hardware dominance into a dynamic ecosystem of engaging services across devices. Enter...
-
Lead Site Reliability Engineer
1 week ago
Mountain View, California, United States Samsung Electronics Full timePosition OverviewSamsung has evolved from a hardware leader into a vibrant ecosystem of innovative services across devices. At the forefront of this transformation is Samsung Ads, a flourishing division poised for significant growth.Our Global Ads Product & Engineering team, with a robust presence across multiple countries, is integral to this advancement....
-
Senior Reliability Engineer
2 weeks ago
Mountain View, California, United States CENTRL Full timeCENTRL is looking for a skilled and proactive Senior Site Reliability Engineer to enhance our cloud and infrastructure operations. In this pivotal role, you will be responsible for the strategic oversight, planning, and implementation of our IT systems to ensure optimal performance, scalability, and availability.Key ResponsibilitiesAnalyze and gather metrics...
-
Senior Reliability Engineer
2 weeks ago
Mountain View, California, United States CENTRL Full timeCENTRL is looking for a highly skilled and innovative professional to take on the role of Senior Site Reliability Engineer. In this pivotal position, you will be responsible for the strategic oversight, planning, and implementation of our cloud and infrastructure operations, ensuring optimal availability, scalability, and performance of our IT systems.Key...
-
Senior Reliability Engineer
2 weeks ago
Mountain View, California, United States CENTRL Full timeCENTRL is looking for a highly skilled and innovative Senior Site Reliability Engineer to take charge of our cloud and infrastructure operations. In this pivotal role, you will be responsible for the strategic oversight, planning, and implementation of our IT systems to guarantee optimal performance, scalability, and availability.Key ResponsibilitiesAnalyze...
-
Site Reliability Engineer
4 weeks ago
Mountain View, California, United States VentureDive Full timeJob Brief:As Data Platform Site Reliability Engineering you will manage infrastructure and applications on cloud computing platforms to deliver data processing, governance, and storage. Our platform teams work with exabytes of data, terabytes of memory, and hundreds of thousands of jobs to enable predictable and performant data analytics.As an SRE, you'll...
-
Reliability Engineer
2 weeks ago
Mountain View, California, United States TikTok Full timeTikTok stands as a premier platform for short-form mobile video, dedicated to fostering creativity and spreading joy. Our global presence spans numerous cities, reflecting our commitment to innovation and community. The Trust and Safety Engineering Team is rapidly expanding, focusing on the development of advanced machine learning models and systems aimed at...
-
Reliability Engineer
2 weeks ago
Mountain View, California, United States TikTok Full timeTikTok is the premier platform for short-form mobile video, dedicated to fostering creativity and spreading joy. Our Trust and Safety engineering division is rapidly expanding, focusing on the development of machine learning models and systems designed to combat internet abuse and fraud. Our objective is to safeguard billions of users and content creators...
-
Site Reliability Engineer
3 days ago
Mountain View, California, United States Insight Global Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team in the Bay Area. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud...
-
Site Reliability Engineer
1 day ago
Mountain View, California, United States Atlassian Full timeAbout the RoleWe're seeking a highly skilled Cloud Infrastructure Engineer to join our Site Reliability team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.Key ResponsibilitiesDesign and Implement Cloud Infrastructure: Collaborate with...
-
Senior Site Reliability Engineer
1 day ago
Mountain View, California, United States Groq Full timeAbout the RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and related support systems.Key...
-
Linux Systems Engineer
2 days ago
Mountain View, California, United States Motion Recruitment Full timeAbout the RoleMotion Recruitment is seeking a highly skilled Linux Systems Engineer to join our team. As a Site Reliability Engineer, you will be responsible for managing and maintaining large-scale Linux environments, implementing automation, and ensuring the reliability and scalability of our systems.Key ResponsibilitiesDesign, implement, and maintain...
-
Site Reliability Engineer
4 days ago
Mountain View, California, United States Atlassian Full timeAbout the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.Key ResponsibilitiesImprove Service Reliability: Actively work to improve the performance and reliability of...
-
Reliability Engineering Specialist
2 weeks ago
Mountain View, California, United States TikTok Full timeTikTok stands as a premier platform for short-form mobile video, dedicated to fostering creativity and delivering joy. Our global presence spans numerous cities, enhancing our mission to protect users and content creators worldwide. The Trust and Safety Engineering Team is rapidly expanding, tasked with developing advanced machine learning models and systems...
-
Site Reliability Engineer
3 days ago
Mountain View, California, United States eTek IT Services, Inc. Full timeJob DescriptionWe are seeking a highly skilled Site Reliability Engineer - Cloud Infrastructure to join our team at eTek IT Services, Inc.Role: As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure.Responsibilities:Data Monitoring and Alerting: Design and implement...
-
Reliability Test Engineer
2 days ago
Mountain View, California, United States Yoh Full timeJob Summary:Yoh, a Day & Zimmermann company, is seeking a highly skilled Reliability Test Engineer to join our team. As a key member of our engineering team, you will be responsible for ensuring the reliability and quality of our products through various testing procedures.Key Responsibilities:Execute established reliability test procedures and perform...
-
Site Reliability Engineer
1 day ago
Mountain View, California, United States TikTok Full timeAbout the RoleTikTok is seeking a highly skilled Site Reliability Engineer to join our Trust and Safety engineering team. As a Site Reliability Engineer, you will be responsible for managing the day-to-day operations of our data services, including SLA management, system deployment, performance tuning, and troubleshooting.Key ResponsibilitiesManage...
-
Infrastructure Reliability Engineer
1 week ago
Mountain View, California, United States Optomi Full timeExciting Opportunity for a Systems Reliability SpecialistWe are seeking a talented Systems Reliability Specialist to become part of a reputable consulting organization. If you possess a strong technical foundation and a proactive mindset, this role could be an excellent fit for you. As an integral member of the Reliability team, your primary focus will be to...
-
Reliability and Performance Engineer
6 days ago
Mountain View, California, United States BCForward Full timeJob DescriptionBCforward is currently seeking a highly motivated Site Reliability Engineer for an opportunity in a dynamic and innovative company.Position Title: Site Reliability EngineerLocation: Remote (with occasional on-site visits)Job Type: Contract (40 hours weekly), HybridPay Range: $95/hr - $97/hrPlease note that actual compensation may vary within...