Site Reliability Engineer

6 days ago


San Francisco, United States Arbitrum Full time

Our mission is to bring blockchain to a billion people. The Alchemy Platform is a world class developer platform designed to make building on the blockchain easy. We've built leading infrastructure in the space, powering over$105billion in transactions for tens of millions of users in 99% of countries worldwide.

The Alchemy team draws from decades of deep expertise in massively scalable infrastructure, AI, and blockchain from leadership roles at leading companies and universities like Google, Microsoft, Facebook, Stanford, and MIT.

Alchemy recently raised aSeries C1 at a $10.2B valuation led by Lightspeed and Silver Lake. Previously, Alchemy raised from a16z, Coatue, Addition, Stanford University, Coinbase, the Chairman of Google, Charles Schwab, and thefounders and executives of leading organizations.

As an engineer in the Infrastructure department at Alchemy, you will collaborate with our engineering team to design, deploy, and continuously improve the infrastructure supporting our globally used developer platform. Your focus will be on enhancing developer productivity and ensuring product reliability as we scale.

The Infrastructure team’s mission is to provide the infrastructure, tooling and expertise needed to allow Alchemy engineers to ship, scale and operate high quality products to our customers in a fast, safe and cost efficient manner.

Come and help us build, maintain and scale the underlying infrastructure that is required to build products that delight our customers when it comes to reliability, latency and cost.

What You'll Do:

  • Set high standards for Reliability at Alchemy
  • Develop and own company wide Reliability best practices like SLO definition, incident management, postmortem reviews, launch readiness reviews, change management
  • Architect production infrastructure and tools that encourage and enforces high reliability
  • Inspire the broader engineering organization to ensure Reliability is a first class citizen in the products we build
  • Collaborate, partner, advice, review and mentor engineering teams on Reliability topics like high reliability architecture, observability, safe change management
  • Improve critical infrastructure and systems that are used to operate infrastructure at scale (i.e. compute, networking, deployment, observability, code tooling/libraries etc.)
  • Develop and own best practices for managing production infrastructure: provisioning, application scaling, configuration management, capacity planning, monitoring, etc.
  • Develop and own best practices for developer processes: CI/CD, dev and staging environments, etc.
  • Provide input into long-term platform requirements and operational guidelines with a focus on reliability
  • Continuously raise our standard of engineering excellence by implementing best practices for coding, testing, and deployment
  • Build and maintain documentation around process and workflows

What We're Looking For:

  • 6+ years of experience as an Infrastructure Engineer focused on Reliability (e.g., Site Reliability Engineer, Production Engineer, Platform Engineer)
  • Experience leading and driving company wide reliability efforts and engineering initiatives
  • Experience with observability best practices and tooling like Prometheus, Grafana and Datadog
  • Experience designing and operating large-scale, multi-region production systems
  • Experience working with AWS or other cloud infrastructures
  • Experience with container schedules and runtimes such as Docker and Kubernetes
  • Experience with Infrastructure-as-Code (e.g. Terraform, Pulumi, Chef, Puppet, etc)
  • The cross-functional nature of this role requires strong communication and collaborations skills
  • (Preferred) Experience with running production services on bare-metal
  • (Preferred) Experience with Typescript and Python
  • (Preferred) Excellent understanding of web applications and architecture

More on The Role

Alchemy is committed to offering competitive compensation, including base salary as well as equity. Additionally, Alchemy offers comprehensive medical, dental, and vision coverage, as well as other benefits such as 401k and unlimited flexible time off.

The base salary range for this position is estimated to be between $135,000 - $350,000 annually. Please note this range reflects base salary only, and does not include bonus, equity, or benefits. Your salary will be determined by various factors, including relevant experience, skill set, qualifications, and other business needs.

Accepted file types: pdf, doc, docx, txt, rtf

Accepted file types: pdf, doc, docx, txt, rtf

LinkedIn Profile (If you don't have a profile, please type N/A) *

Do you have 5+ years of relevant industry experience as an Infrastructure Engineer? * Select...

Website

How did you hear about Alchemy? *

Are you legally authorized to work in the United States? * Select...

Will you now or in the future require sponsorship for employment visa status (e.g. H-1B visa status)? * Select...

Are you able to work onsite at one of our offices in either San Francisco or NYC? * Select...

#J-18808-Ljbffr

  • San Francisco, United States Bun Full time

    Bun is an open-source JavaScript tooling company focused on making programming simpler. We've raised $26 million from top investors in Silicon Valley, are among the top GitHub repositories and have a growing community of 33,000 Discord members.We're hiring an experienced Site Reliability Engineer to scale and maintain the infrastructure that builds and tests...


  • San Francisco, United States EVONA Full time

    Site Reliability Engineer (SRE)Location: San Francisco Bay AreaRole Overview:We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation...


  • San Francisco, United States Ellation, Inc. Full time

    Who We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...


  • San Francisco, United States Unreal Gigs Full time

    Are you passionate about building and maintaining resilient systems that ensure high availability and performance? Do you excel at automating processes, troubleshooting complex issues, and creating systems that scale smoothly? If you're ready to take on the challenge of ensuring reliable, efficient, and secure system operations, our client has the perfect...


  • San Francisco, California, United States Indotronix International Corporation Full time

    Job DescriptionWe are seeking a highly experienced Site Reliability Engineering Lead to join our team at Indotronix International Corporation.The ideal candidate will have experience with site reliability engineering, Kubernetes, Docker, CI/CD, and Jenkins, as well as strong production support skills. A background in Splunk or similar logging/observability...


  • san francisco, United States New York Technology Partners Full time

    Must Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years


  • San Francisco, United States New York Technology Partners Full time

    Must Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years


  • San Francisco, United States Resource Informatics Group Full time

    Job Title: Site Reliability Engineer Work Location: San Francisco, CA (Hybrid after showing successful engagement) Duration: 18+ months Most important skills:10 years of Oracle database administration experience on large production environment Database hands on skills especially around database and system troubleshooting and administration GoldenGate setup,...


  • San Francisco Bay Area, United States Bun Full time

    Bun is an open-source JavaScript tooling company focused on making programming simpler. We've raised $26 million from top investors in Silicon Valley, are among the top GitHub repositories and have a growing community of 33,000 Discord members.We're hiring an experienced Site Reliability Engineer to scale and maintain the infrastructure that builds and tests...


  • San Francisco, United States ESL FACEIT GROUP Full time

    At EFG (ESL FACEIT Group) we create worlds beyond gameplay where players and fans become community. We pride ourselves in having a corporate social responsibility which is that “IT’S NOT GG (Good Game), UNTIL IT’S GG FOR ALL”. We are passionate about the culture we foster that ultimately helps to create and shape the world of esports, gaming...


  • San Francisco, United States ESL FACEIT Group Full time

    At EFG (ESL FACEIT Group) we create worlds beyond gameplay where players and fans become community. We pride ourselves in having a corporate social responsibility which is that “IT’S NOT GG (Good Game), UNTIL IT’S GG FOR ALL”. We are passionate about the culture we foster that ultimately helps to create and shape the world of esports, gaming...


  • San Francisco, United States Asystem Full time

    Particle is a startup based in the San Francisco Bay Area. We are seeking candidates who are self-starters, adaptable, and flexible in a startup environment. As a team of veteran technologists from Twitter, Tesla, Periscope, and more, we are developing a next-generation news platform to redefine your daily intake of news. We value active engagement in...


  • San Francisco, California, United States Springshot Full time

    Springshot lives at the intersection between technology and humanity. We assimilate and simplify the complex, striving to provide users with easy-to-use web and mobile interfaces that present the right information at the right time so they can make the right decision or take the right physical action, including through robotics and autonomous machines.This...


  • San Francisco, United States Perplexity AI Full time

    Perplexity is seeking a Site Reliability Engineer (SRE) to join our small team in revolutionizing the way people search and interact with the internet. You will be responsible for leading the design, implementation, and scaling of the infrastructure and systems that support our web and mobile products. The ideal candidate should have experience in designing...


  • San Francisco, United States Navient Full time

    Our mission is to make higher education accessible and affordable for everyone. We empower students with financial support and supercharge their ability to pay down their debt, so they can get on the right financial track, fast. We build tools that help people feel in control of their financial future, including: Private student loans - low rates,...


  • San Francisco, United States Stefanini, Inc Full time

    Join us to co-create solutions for a better future!Job DetailsInformation TechnologySite Reliability Engineer San Francisco, CA Posted: 12/27/2024Job ID#: 59294Job Category: Information TechnologyPosition Type: Full TimeDuration: Long-TermStefanini Group is hiring!Stefanini is looking for a Site Reliability Engineer in San Francisco, CA (Hybrid)For quick...


  • San Jose, United States Avance Consulting Full time

    Site Reliability EngineerSan Jose, CA - OnsiteFull time we are seeking a Skied Site Renaulty Engineer (SRE) with expertise in Github Actions, AWS DevOps, nem Charts, and YAML Configuration. The real candidate will be responsible based applications. You will work closely with development teams to implement and manage automation processes, infrastructure, and...


  • San Jose, United States Avance Consulting Full time

    Site Reliability EngineerSan Jose, CA - OnsiteFull time we are seeking a Skied Site Renaulty Engineer (SRE) with expertise in Github Actions, AWS DevOps, nem Charts, and YAML Configuration. The real candidate will be responsible based applications. You will work closely with development teams to implement and manage automation processes, infrastructure, and...


  • San Jose, United States Avance Consulting Full time

    Role: Site Reliability EngineerLocation: San Jose, CA - OnsiteDuration: Full Time (Permanent Role)We are seeking a Skied Site Renaulty Engineer (SRE) with expertise in Github Actions, AWS DevOps, nem Charts, and YAML Configuration. The real candidate will be responsible based applications. You will work closely with development teams to implement and manage...


  • San Jose, United States Avance Consulting Full time

    Role: Site Reliability EngineerLocation: San Jose, CA - OnsiteDuration: Full Time (Permanent Role)We are seeking a Skied Site Renaulty Engineer (SRE) with expertise in Github Actions, AWS DevOps, nem Charts, and YAML Configuration. The real candidate will be responsible based applications. You will work closely with development teams to implement and manage...