Reliability Operations Manager
2 days ago
The Trade Desk is a leading global technology company that empowers brands to drive real connections with their customers. Our mission is to constantly improve the reliability of our platform, ensuring a seamless customer experience.
Job SummaryWe are seeking a highly skilled Reliability Operations Manager to join our global Reliability Operations team. This role will be responsible for defining, managing, and measuring incident response engineering practices, liaising with engineering teams, and managing a global team of reliability operations engineers.
Key Responsibilities- Define, manage, and measure incident response engineering practices
- Liaise with engineering teams to ensure work discovered during incident response is prioritized
- Participate in incident response engineering duties as necessary
- Manage a global Reliability Operations team (3 to 6+ Reliability operations engineers across NAMER, EMEA, APAC)
- Periodically meet with reports across timezones
- There may be periodic weekend coverage requirements
- Bachelor's Degree from a four-year university or relevant substitute experience
- 6+ years relevant work experience in Technical and/or Application Support with strong knowledge of technical troubleshooting
- 2-5 years of management experience with direct reports
- Adaptive management style according to level and proficiency of engineering reports
- Ability to understand technical employee career paths and collaboratively develop career plans
- Scheduling a global team through holidays, sickness, and vacation leaves, across timezones
- Understanding of large-scale distributed system architectures (e.g., databases, web services, application services)
- Familiarity with monitoring tools (e.g., Prometheus, Grafana, Nagios)
- Ability to author scripts to facilitate troubleshooting as well as configure alerts
- Proficiency in scripting languages (e.g., Python, Bash) is a plus
The Trade Desk offers a competitive total compensation and benefits package, including comprehensive healthcare, retirement benefits, short and long-term disability coverage, basic life insurance, well-being benefits, reimbursement for certain tuition expenses, parental leave, sick time, vacation time, and around 13 paid holidays per year.
Employees can also purchase The Trade Desk stock at a discount through The Trade Desk's Employee Stock Purchase Plan.
-
Reliability Operations Manager
3 weeks ago
San Jose, California, United States The Trade Desk Full timeAbout The Trade DeskThe Trade Desk is a leading global technology company that empowers brands to connect with consumers through its innovative, cloud-based platform. Our mission is to deliver exceptional customer experiences by ensuring the reliability and performance of our platform.Job SummaryWe are seeking a highly skilled Reliability Operations Manager...
-
Senior Site Reliability Manager
3 days ago
San Jose, California, United States Triune Infomatics Inc Full timeRole:Senior Site Reliability ManagerTriune Infomatics Inc is seeking an experienced Senior Site Reliability Manager to join our team and contribute to the design and upkeep of our cloud-based IoT edge orchestration solution.Job Summary:The Senior Site Reliability Manager will be responsible for ensuring the availability of our SaaS platform and meeting the...
-
Reliability Engineer
3 days ago
San Jose, California, United States NetApp Full timeJob SummaryAs a Site Reliability Engineer, you will be responsible for ensuring the stability and security of multiple open-source systems and platforms that are run or operated in our environment.Key ResponsibilitiesBuilding and maintaining a reliable site environment to meet the development and maintenance requirements of open-source systems and...
-
Reliability Engineer
4 weeks ago
San Jose, California, United States Antora Energy Full timeJob Title: Sr. Reliability EngineerAt Antora Energy, we're committed to revolutionizing the way industries approach energy storage. As a Sr. Reliability Engineer, you'll play a pivotal role in ensuring the high reliability and availability of our thermal battery systems.Key Responsibilities:Collaborate with cross-functional teams to scope, define, design,...
-
San Jose, California, United States Adobe Full timeAbout the RoleWe are seeking an exceptional Site Reliability Engineering Manager to lead our team in driving reliability for Adobe's AI Inference Platform, Adobe Firefly. As a key member of our Engineering organization, you will be responsible for developing a team of Site Reliability Engineers who will work closely with our Engineering teams to build,...
-
Site Reliability Engineer
4 weeks ago
San Jose, California, United States Diverse Lynx Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement automation scripts using shell,...
-
Reliability Expert
1 month ago
San Jose, California, United States Power Integrations Full timeJob SummaryWe are seeking a highly skilled Senior Reliability Engineer to join our team at Power Integrations. As a key member of our reliability engineering team, you will be responsible for evaluating the reliability of IC products, packages, and process technology to ensure suitability for end applications and conformance to industry standards.Key...
-
Site Reliability Engineer
3 weeks ago
San Jose, California, United States Syntricate Technologies Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, implement, and maintain scalable and highly...
-
San Jose, California, United States Adobe Full timeAbout the RoleWe are seeking an exceptional Site Reliability Engineering Manager to lead our team in driving reliability for Adobe's AI Inference Platform, Adobe Firefly. As a key member of our Engineering organization, you will be responsible for developing a team of Site Reliability Engineers who will work closely with our Engineering teams to build,...
-
Site Reliability Engineer
4 weeks ago
San Jose, California, United States NetApp Full timeJob SummaryAs a Site Reliability Engineer at NetApp, you will be responsible for managing, supporting, and maintaining a reliable environment for our site. This involves ensuring the stability and security of multiple open-source systems and platforms that are run or operated in that environment.Key ResponsibilitiesBuilding and supporting a reliable site for...
-
Site Reliability Engineer
4 days ago
San Jose, California, United States NetApp Full timeJob SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at NetApp. As a Site Reliability Engineer, you will be responsible for managing, supporting, and maintaining a reliable environment for our site to ensure the stability and security of multiple open-source systems/platforms.Key ResponsibilitiesBuilding and supporting a...
-
Site Reliability Engineer
4 days ago
San Jose, California, United States Tik Tok Full timeTransforming Data Infrastructure with TikTokTikTok is a pioneer in innovation, merging software development and infrastructure operations to design, build, and manage large-scale, highly distributed systems. Our Site Reliability Engineering (SRE) team is a key player in this journey, overseeing one of the industry's most extensive cloud...
-
Site Reliability Engineer
2 days ago
San Jose, California, United States Diverse Lynx Full timeJob Title: Site Reliability EngineerJob Summary:Diverse Lynx LLC is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design and implement automation scripts using shell,...
-
Site Reliability Engineer
4 weeks ago
San Jose, California, United States Trianz Full timeAbout TrianzTrianz is a leading-edge technology platforms and services company that accelerates digital transformations at Fortune 100 and emerging companies worldwide in data & analytics, digital experiences, cloud infrastructure, and security.Our VisionWe believe that companies around the world face three challenges in their digital transformation journeys...
-
Staff Site Reliability Engineer
4 weeks ago
San Jose, California, United States Zscaler Full timeAbout ZscalerZscaler is a leading cloud security company that accelerates digital transformation for its customers. With a cloud-native platform, Zscaler protects thousands of organizations from cyber threats and data loss by securely connecting users, devices, and applications in any location.As a pioneer in cloud security, Zscaler has over 10 years of...
-
Site Reliability Engineer
4 weeks ago
San Jose, California, United States Tik Tok Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our dynamic team at TikTok. As a pioneer in innovation, our data infrastructure SRE team seamlessly merges software development and infrastructure operations to design, build, and manage large-scale, highly distributed systems.Key ResponsibilitiesParticipate in and enhance the...
-
Site Reliability Engineer
4 days ago
San Jose, California, United States Tik Tok Full timeAbout UsTikTok is a global leader in short-form mobile video, inspiring creativity and bringing joy to users worldwide. Our mission is to empower creators and communities to thrive in a vibrant, inclusive space.Job SummaryWe're seeking a skilled Site Reliability Engineer to join our dynamic team, driving innovation and excellence in our cloud infrastructure....
-
Operations Manager
2 weeks ago
San Francisco, California, United States MP Mine Operations LLC Full timeJob Title: Operations SupervisorMP Materials is seeking an experienced Operations Supervisor to join our team at our mining and processing site in Mountain Pass, California. As an Operations Supervisor, you will be responsible for managing and directing mineral processing and/or chemical plant activities.Key Responsibilities:Oversee safe, reliable, and...
-
Senior Site Reliability Engineer
3 weeks ago
San Jose, California, United States F5 Full timeJob SummaryF5 is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you will play a pivotal role in ensuring the reliability and scalability of our distributed cloud product.Key ResponsibilitiesDesign and implement automation solutions to reduce toil and improve operational efficiencyParticipate in...
-
Senior Site Reliability Engineer
3 weeks ago
San Jose, California, United States Hireio, Inc. Full timeAbout the RoleHireio, Inc. is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our data infrastructure team, you will be responsible for designing, building, and managing large-scale, highly distributed systems.Our team is a pioneer in innovation, seamlessly merging software development and infrastructure...