Current jobs related to Director of Site Reliability Engineering - San Francisco, California - DataRobot
-
Staff Site Reliability Engineer
3 weeks ago
San Francisco, California, United States Crunchyroll Full timeAbout CrunchyrollWe're a global entertainment company dedicated to delivering the art and culture of anime to a passionate community. Our mission is to help everyone belong, and we're looking for talented individuals to join our team.The RoleWe're seeking a Staff Site Reliability Engineer to maintain and enhance the reliability of our data infrastructure. As...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Unreal Gigs Full timeJob Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Unreal Gigs Full timeJob Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States DaVita Full timeAbout the RoleThe WEX Site Reliability Engineering team is seeking a skilled Site Reliability Engineer to join our Platform Reliability organization. As a key member of our team, you will be responsible for developing software and solutions focused on observability, incident response, reliability, and performance.You will collaborate with our engineering...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Roman Health Pharmacy LLC Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Xero. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based platform.Key ResponsibilitiesInvestigate operational surprises and support teams in post-incident activitiesConduct in-depth incident...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Instabase Full timeAbout InstabaseAt Instabase, we're passionate about harnessing the power of AI innovation to democratize access to cutting-edge technology and empower organizations to solve complex unstructured data problems. With a strong presence in the market and a talented team, we're committed to delivering top-tier solutions that drive business success.Job...
-
Staff Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Aitopics Full timeAbout the RoleWe are seeking a highly skilled Staff Site Reliability Engineer to join our Data Engineering team. As a key member of our team, you will be responsible for maintaining and enhancing the reliability of our data infrastructure.Your work will directly impact the availability and performance of our data services, enabling the organization to make...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Instabase Full timeAbout InstabaseInstabase is a global company with offices in San Francisco, New York, London, and Bengaluru. We're a people-first organization that values experimentation, curiosity, and customer obsession.Job SummaryWe're seeking a Site Reliability Engineer to join our Site Reliability and Platform Engineering team. As a key member of our team, you'll be...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Withorb Full timeAbout UsOrb is a cutting-edge technology company on a mission to revolutionize the way businesses approach revenue growth. Our team is passionate about building a robust infrastructure that enables our customers to unlock their full potential.Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States BaseTen Labs, Inc. Full timeAbout BaseTen Labs, Inc.We're a rapidly growing team of innovators backed by top-tier investors, including IVP, Spark Capital, and Sarah Guo at Conviction. Our mission is to empower machine learning teams at enterprises and AI-native companies to build scalable, reliable, and efficient infrastructure.Job DescriptionWe're seeking a skilled Site Reliability...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Outdefine Full timeAbout the JobWe are seeking a highly skilled Site Reliability Engineer to join our team at Outdefine. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our ecommerce platform.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure using Kubernetes...
-
Site Reliability Engineer
3 weeks ago
San Francisco, California, United States Roman Health Pharmacy LLC Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Xero. As a key member of our Reliability Enablement team, you will play a critical role in ensuring the reliability and performance of our systems.Key ResponsibilitiesInvestigate operational surprises and support teams in post-incident activitiesConduct in-depth...
-
Site Reliability Engineer
3 weeks ago
San Francisco, California, United States YO HR CONSULTANCY Full timeJob Title: Site Reliability EngineerJob Description:At YO HR CONSULTANCY, we are seeking a highly skilled Site Reliability Engineer to join our team.Key Responsibilities:* Extensive experience working with Linux flavors like RHEL/CentOS OS, shells, filesystems, and utilities* Knowledge of distributed computing and experience working with container...
-
Site Reliability Engineering Director
4 weeks ago
San Diego, California, United States Becton, Dickinson & Company Full timeAbout the RoleA Site Reliability Engineering Manager at Becton, Dickinson & Company is responsible for ensuring the smooth operation of complex systems and services. They oversee a team of Site Reliability Engineers to maintain infrastructure, handle incident response, and implement continuous improvement initiatives.Key ResponsibilitiesLead a team of Site...
-
Site Reliability Engineer
3 weeks ago
San Francisco, California, United States Orb Full timeAbout the RoleOrb is seeking a skilled Site Reliability Engineer to join our team. As a key member of our engineering organization, you will play a critical role in maintaining and scaling our robust infrastructure, ensuring stability, scalability, and performance.You will be responsible for tackling complex engineering challenges, from scaling our data...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States SpeedCast Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Speedcast. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our communication products.Key Responsibilities:Analyze and design continuous integration/continuous delivery...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States SpeedCast Full timeJob Title: Site Reliability EngineerAt Speedcast, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our communication products.Key Responsibilities:Analyze and design continuous integration/continuous delivery...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Swish Analytics Full time{"h1": "Site Reliability Engineer at Swish Analytics"} Swish Analytics is a sports analytics and betting startup that's revolutionizing the industry with cutting-edge predictive data products. We're on a mission to make oddsmaking a challenge rooted in engineering, mathematics, and sports betting expertise, not intuition. We're looking for a team-oriented...
-
Site Reliability Engineer
3 weeks ago
San Francisco, California, United States GRNET Full timeGRNET is seeking a highly skilled Site Reliability Engineer to join its team. As an SRE, you will be responsible for designing and implementing fault-tolerant, scalable, and distributed services. You will work closely with the team to bring your technical opinion and vision to the table, handle problems that require under-the-hood investigation, and lead...
-
Site Reliability Engineer
3 weeks ago
San Francisco, California, United States Perplexity Full timeSite Reliability EngineerPerplexity is seeking a highly skilled Site Reliability Engineer to join our team in revolutionizing the way people interact with the internet. As a key member of our infrastructure team, you will be responsible for designing, implementing, and scaling the systems that support our web and mobile products.Key ResponsibilitiesDesign...
Director of Site Reliability Engineering
2 months ago
DataRobot is seeking a highly skilled Director of Site Reliability Engineering to lead our SRE team. As a key member of our organization, you will be responsible for ensuring the reliability, scalability, and performance of our platform.
Key Responsibilities- Manage and develop a team of Site Reliability Engineers, including hiring, performance assessments, and career development.
- Work with cross-functional teams to identify and prioritize features, balancing tactical needs with strategic goals.
- Collaborate with engineers to develop and enhance tools for large-scale services technologies, ensuring high standards in system design and code quality.
- Translate business needs into engineering requirements, manage projects and dependencies, and lead sprint activities.
- Improve operations by analyzing root causes, defects, and technical debt, and implementing solutions to reduce operational load.
- Support large-scale services, manage high-pressure situations, and participate in on-call rotations. Troubleshoot issues from infrastructure to application scaling.
- Diagnose and fix issues by editing code, modifying infrastructure configurations, and creating reusable tooling.
- Develop automation tools and optimize services through version-controlled infrastructure-as-code. Conduct network and performance analysis.
- AWS or strong experience in GCP or Azure.
- Experience with Kubernetes on multiple cloud provider platforms.
- Linux/UNIX (Ubuntu, RedHat, or similar).
- Strong knowledge of TCP/IP networking, SSL, DNS, Load Balancers.
- Application Performance Monitoring principles.
- Bachelor's Degree in CS, MIS, or equivalent experience.
- 7-8 years of relevant experience in a SRE or DevOps role with equivalent responsibilities.
- Demonstrated success managing a team of high-performing professionals.
- Solid communication skills.
DataRobot is the AI Cloud leader, delivering a unified platform for all users, all data types, and all environments to accelerate delivery of AI to production. Trusted by global customers across industries and verticals, including a third of the Fortune 50, delivering over a trillion predictions for leading companies globally.
DataRobot is committed to providing a safe and secure environment for all job applicants. We encourage all job seekers to be vigilant and protect themselves against recruitment scams by verifying the legitimacy of any job offer before providing personal information or paying any fees.