Platform - Site Reliability Engineer II

1 month ago

Mountain View, United States Elastic Full time

Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale - unleashing the potential of businesses and people. The Elastic Search AI Platform, used by more than 50% of the Fortune 500, brings together the precision of search and the intelligence of AI to enable everyone to accelerate the results that matter. By taking advantage of all structured and unstructured data - securing and protecting private information more effectively - Elastic's complete, cloud-based solutions for search, security, and observability help organizations deliver on the promise of AI.

What is The Role:

As part of the Platform Engineering department, the SRE team is designing, building, scaling and maintaining the multi-cloud platform for hosting internal and external services such as the Elastic Cloud Hosted and Serverless. This includes developing new software and tools that themselves support the rest of the infrastructure, so that we can rapidly deploy products from all corners of the organization. We need help in this journey to offer a truly exceptional customer experience. This is where you come in

What You Will Be Doing:

Lead technical initiatives aimed at improving the reliability of the global Elastic infrastructure, taking an engineering approach to the prevention, detection, and timely mitigation of issues.
Contribute to SRE engineering through auto-remediation and system engineering efforts to continue our efforts in reducing human intervention in automation of processes and operational tasks.
Developing and maintaining software, tooling and automations to support the ever growing scaling demands of this global infrastructure.
Champion an environment focused on collaboration, operational excellence, and uplifting others.
Respond to major incidents, correcting and improving systems to prevent incidents and grow at scale. Participate in a weekly on-call rotation, using a follow-the-sun model.

What You Bring:

A well-rounded view of and true appreciation for reliability, borne of real-world experience operating production services. You have examples of using software engineering practices and SRE principles to solve operational problems.
A background in software engineering, and can confidently collaborate with engineers to identify and resolve issues. Ideally with experience in public cloud and managed Kubernetes services
Outstanding interpersonal skills, and are able to build strong relationships with your inclusive communication methods. Examples of working in distributed teams or working remotely is desirable.

Bonus Points:

You don't need to have all of these items, but these represent the types of work you will do as a Site Reliability Engineer at Elastic.

You have operated a SaaS product in a public cloud ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform
You have built or managed a Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, and the vital automation to support it.
You have written non-trivial programs in Go
You have worked with containerized services (such as Docker.)
You have experience in system administration with professional skills in Linux on distributed systems at scale.
You have designed, implemented or diagnosed and resolved issues with the Elastic Stack.
You have demonstrable experience in leading and improving alerting and major incident management standard processes metrics systems (e.g. Elastic Stack, Graphite, Prometheus, Influx) to diagnose issues and quantify impacts to share with others at varying level of the organization.
You are experienced in contributing in a self-organizing and collaborative team environment.
You have mentored, coached, and grown team members to bring out the best in them.

Additional Information - We Take Care of Our People:

As a distributed company, diversity drives our identity. Whether you're looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn't matter if you're just out of college or your children are; we need you for what you can do.

We strive to have parity of benefits across regions, and while regulations differ from place to place, we believe taking care of our people is the right thing to do.

Competitive pay based on the work you do here and not your previous salary
Health coverage for you and your family in many locations
Ability to craft your calendar with flexible locations and schedules for many roles
Generous number of vacation days each year
Increase your impact - We match up to $2000 (or local currency equivalent) for financial donations and service
Up to 40 hours each year to use toward volunteer projects you love
Embracing parenthood with a minimum of 16 weeks of parental leave

Different people approach problems differently. We need that. Elastic is an equal opportunity/affirmative action employer committed to diversity, equity, and inclusion. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation.

We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. To request an accommodation during the application or the recruiting process, please email candidate_accessibility@elastic.co We will reply to your request within 24 business hours of submission.

Applicants have rights under Federal Employment Laws; view posters linked below: Family and Medical Leave Act (FMLA) Poster; Pay Transparency Nondiscrimination Provision Poster; Employee Polygraph Protection Act (EPPA) Poster and Know Your Rights (Poster)

Elasticsearch develops and distributes technology and information that is subject to U.S. and other country export controls and licensing requirements for individuals who are located in or are nationals of the following sanctioned countries and regions: Belarus, Cuba, Iran, North Korea, Russia, Syria, the Crimea Region of Ukraine, the Donetsk People's Republic ("DNR"), and the Luhansk People's Republic ("LNR"). If you are located in or are a national of one of the listed countries or regions, an export license may be required as a condition of your employment in this role. Please note that national origin and/or nationality do not affect eligibility for employment with Elastic.

Please seeherefor our Privacy Statement.

Compensation for this role is in the form of base salary. This role does not have a variable compensation component. The typical starting salary range for new hires in this role is listed below. In select locations (including Seattle WA, Los Angeles CA, the San Francisco Bay Area CA, and the New York City Metro Area), an alternate range may apply as specified below. These ranges represent the lowest to highest salary we reasonably and in good faith believe we would pay for this role at the time of this posting. We may ultimately pay more or less than the posted range, and the ranges may be modified in the future. An employee's position within the salary range will be based on several factors including, but not limited to, relevant education, qualifications, certifications, experience, skills, geographic location, performance, and business or organizational needs. Elastic believes that employees should have the opportunity to share in the value that we create together for our shareholders. Therefore, in addition to cash compensation, this role is currently eligible to participate in Elastic's stock program. Our total rewards package also includes a company-matched 401k with dollar-for-dollar matching up to 6% of eligible earnings, along with a range of other benefits offered with a holistic emphasis on employee well-being. The typical starting salary range for this role is: $110,900-$165,500 USD The typical starting salary range for this role in the select locations listed above is: $110,900-$165,500 USD

RequiredPreferredJob Industries

Other

Principal Site Reliability Engineer, Infrastructure Platform

20 hours ago

Mountain View, United States Groq Full time

At Groq. We believe in an AI economy powered by human agency. We envision a world where AI is accessible to all, a world that demands processing power that is better, faster, and more affordable than is available today. AI applications are currently constrained by the limitations of the Graphics Processing Unit (GPU), a technology originally developed for...
Senior Site Reliability Engineer

2 weeks ago

Mountain View, California, United States Tik Tok Full time

Do you have what it takes to be a leader in the field of Site Reliability Engineering? We are seeking an experienced Senior Site Reliability Engineer to join our USDS team at TikTok Ads, where you will be responsible for designing, implementing, and maintaining large-scale data platforms.The successful candidate will have a strong background in software...
Site Reliability Engineer

4 weeks ago

Mountain View, United States Atlassian Full time

Site Reliability Engineer Intern, 2025 Summer U.S.Site Reliability Engineering | Mountain View, United States or RemoteJoin Atlassian as an intern and spend your summer with us having an impact on how millions of users collaborate and use software. We‘re in the business of developing software to help teams everywhere get amazing ideas on the ground and...
Site Reliability Engineering Leader

2 weeks ago

Mountain View, California, United States S M Software Solutions Inc Full time

Job Title: Site Reliability Engineering Leader**Company Overview**S M Software Solutions Inc is a leading provider of innovative software solutions. We are seeking a seasoned Site Reliability Engineering Leader to lead our cloud engineering efforts. The ideal candidate will have experience in building and maintaining scalable, secure, and reliable cloud...
Software Engineering Manager II, Site Reliability Engineering, Google Cloud

20 hours ago

Mountain View, United States Google Inc. Full time

Software Engineering Manager II, Site Reliability Engineering, Google CloudApplyLocation Options:By applying to this position, you will have an opportunity to share your preferred working location from the following: Sunnyvale, CA, USA; Mountain View, CA, USA; Raleigh, NC, USA; Durham, NC, USA; San Francisco, CA, USA.Minimum Qualifications:Bachelor’s...
Data Platform Reliability Engineer

2 weeks ago

Mountain View, California, United States Tik Tok Full time

Job Summary:We are seeking a highly skilled Data Platform Reliability Engineer to join our team at TikTok. This is an exciting opportunity to work on large-scale, distributed systems and ensure the reliability, scalability, and efficiency of our data platforms.About Us:TikTok is a leading destination for short-form mobile video, with a mission to inspire...
Staff Cloud DevOps/Site Reliability Engineer

2 days ago

Mountain View, United States Inworld AI Full time

Why Join InworldInworld is the best-funded startup in AI and games with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft’s M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.Inworld is the leading AI engine for games and interactive...
Staff Cloud DevOps/Site Reliability Engineer

2 days ago

Mountain View, United States Inworld AI Full time

Why Join InworldInworld is the best-funded startup in AI and games with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft’s M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.Inworld is the leading AI engine for games and interactive...
Staff Cloud DevOps/Site Reliability Engineer

20 hours ago

Mountain View, United States Inworld AI Full time

Our Technical Operations team manages the infrastructure, DevOps, and Site Reliability of our platform. We are looking for a Staff Cloud DevOps/Site Reliability Engineer to join our team.QualificationsBachelor's degree in Computer Science, Engineering, or a related field7+ years of experience as a DevOps, Infrastructure, Operations, or Site Reliability...
Staff Cloud DevOps/Site Reliability Engineer

18 hours ago

Mountain View, United States Inworld Full time

view open rolesWhy Join InworldInworld is the best-funded startup in AI and games with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft’s M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.Inworld is the leading AI engine for games and...
Platform Engineering Manager

2 weeks ago

Mountain View, California, United States Commure + Athelas Full time

Company OverviewAt Commure + Athelas, we revolutionize healthcare by simplifying providers' lives and keeping them connected to their patients. Our innovative suite of software and hardware, augmented by advanced AI, boosts efficiency across every domain of healthcare, freeing up healthcare providers to spend more time caring for patients. With over 250,000...
Site Reliability Engineer

1 month ago

Mountain View, United States Tik Tok Full time

Responsibilities About TikTok U.S. Data Security TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security ("USDS") is a subsidiary of TikTok in the U.S. This new, security-first division was created to bring heightened focus and governance to our data protection policies and content...
Sr. Site Reliability Engineer

1 day ago

Mountain View, United States CENTRL Inc. Full time

CENTRL is a super-fast-growing Silicon Valley technology company focused on third-party risk, due diligence, cyber risk, and security. It has offices in the SF Bay Area, NY, Australia, and India. CENTRL’s clients include leading companies around the world including several Fortune 500 firms. CENTRL is led by a highly experienced management team with a...
Cloud Engineering Leader for Scalable Platform Solutions

1 month ago

Mountain View, California, United States Inworld AI Full time

Inworld AI is seeking a skilled Cloud DevOps/Site Reliability Engineer to maintain and optimize our infrastructure.ResponsibilitiesInfrastructure Management: Contribute to Infrastructure-as-Code (Terraform) and maintain cloud infrastructure on AWS, Azure, or GCP platforms.Pipeline Orchestration: Develop CI/CD pipelines using Github Actions, Helm, and ArgoCD...
Robotics Platform Engineer

2 weeks ago

Mountain View, California, United States Nuro Full time

Develop Innovative SolutionsNuro's Devices Platform team is looking for a skilled Robotics Platform Engineer to create innovative solutions for sensor and compute systems, architect and deploy Nuro sensors and autonomous SW with high reliability and performance, deliver reliable SW through metrics monitoring, automated testing, and vendor collaboration,...
Cloud Platform Engineer

1 week ago

Mountain View, California, United States Verily Full time

About the Role:This is an exciting opportunity to join a world-class platform engineering team that has a large positive impact on the software development process at Verily. You will work closely with engineers who have worked on successful projects like Kubernetes, Istio, and Firebase.Key Responsibilities:Technical Design and Implementation: Own the...
Staff Platform Engineer

2 days ago

Mountain View, United States Inworld AI Full time

Why Join InworldInworld is the best-funded startup in AI and games with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft’s M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.Inworld is the leading AI engine for games and interactive...
Staff Platform Engineer

2 days ago

Mountain View, United States Inworld AI Full time

Why Join InworldInworld is the best-funded startup in AI and games with a $500 million valuation and backing from top tier investors including Intel Capital, Microsoft’s M12 fund, Lightspeed Venture Partners, Section 32, BITKRAFT Ventures, Kleiner Perkins, Founders Fund, and First Spark Ventures.Inworld is the leading AI engine for games and interactive...
Search Relevance Engineer II

2 weeks ago

Mountain View, California, United States Moveworks Full time

About MoveworksMoveworks is a leading provider of AI-powered search and automation solutions for businesses. Our mission is to empower employees to work faster and more efficiently by providing instant access to information and support.Our innovative technology, the Moveworks Copilot, uses a combination of public and proprietary language models to understand...
Senior Software Engineer, Platform Engineering

2 days ago

Mountain View, United States Verily Full time

Who We AreVerily is a subsidiary of Alphabet that is using a data-driven approach to change the way people manage their health and the way healthcare is delivered. Launched from Google X in 2015, our purpose is to bring the promise of precision health to everyone, every day. We are focused on generating and activating data from a variety of sources,...

Americas

Europe

Asia / Oceania

Africa

Platform - Site Reliability Engineer II