Lead Site Reliability Engineer

4 weeks ago


Remote, Oregon, United States Henry Meds Full time

About Henry Meds:
Tens of millions of Americans are unable to manage their chronic conditions with commercial medications. Using specialized compounded formulas tailored to individual patient needs, Henry helps people who have been left behind by the commercial market, all while remaining easy, accessible, and affordable. Our customers get access to the care they need, and save thousands of dollars on out-of-pocket healthcare expenses per year

Enjoy the casual culture, remote-first workplace, and generous PTO/benefits

Apply today to make a direct, daily impact in one of the fastest-growing startups in the country - we are excited to meet you

Position Overview:
We are seeking our first Lead SRE (Site Reliability Engineering) Engineer. In this role, you would ensure the reliability, scalability, and performance of complex systems and cloud infrastructure. Off the bat, you will outline observability guidelines for the company. The role involves close collaboration with engineering and security teams to integrate SRE principles throughout the software development lifecycle. Strong analytical skills are essential for diagnosing and resolving issues, while leadership abilities are crucial for mentoring junior engineers and fostering a culture of continuous improvement. As the first SRE hire you will be instrumental in building the team and setting the direction of our DevOps culture. You will assist in hiring for our SRE, Platform, and Shared Services Teams.

Duties and Responsibilities:

  • Architect and create our observability and monitoring system.
  • Create a disaster recovery plan and facilitate disaster recovery testing. Familiarity with DiRT exercises is a plus.
  • How configurations and networking are managed per environment, and how all systems are monitored, supported, and scaled in the production environment.
  • Oversee teams who are responsible for the design, architecture, and development of operational infrastructure within our platform.
  • Assist in hiring to perform daily operations and embed SRE operations across the department.
  • Provide architectural and technical guidance and mentorship to SRE teams, fostering skill development, and building strong and capable SRE practices.
  • Lead and prioritize multiple projects, create roadmaps, and drive implementation plans.
  • Partner with product and engineering stakeholders to proactively identify operational needs and deliver solutions.

You will likely have:

  • Experience in GCP working with stakeholders to develop and document resilient services, across multiple edge and availability zones, with documented comprehensive disaster recovery plans and regularly conduct drills and exercises to test and validate the effectiveness of these plans
  • Experience managing identity and access management to control resources and services in GCP and work with stakeholders to develop security practices and procedures to ensure compliance with industry best practices and regulations.
  • Experience managing the security and monitoring systems in our cloud that ensure our systems health.
  • Experience leading incident management processes, conducting post-mortems, and driving improvements to prevent future incidents.
  • Experience setting up availability expectations, addressing performance issues, uncovering observability gaps, leading problem management, and driving capacity planning.
  • The ability to manage cloud operations, installing, maintaining, and monitoring network resources.
  • Experience Defining SLOs, SLIs, leads on-call support schedules, troubleshooting, building support playbooks, implementing monitoring and alerting, logging standards, and conducting performance testing.
  • Experience creating playbooks utilizing a chaos engineering mindset and resilience testing
  • Experience architecting Infrastructure As Code using Terraform

You may have:

  • 10 + years of overall in a DevOps or Site Reliability Engineer environment
  • 2+ years of leading Cloud SRE teams across AWS and Google Cloud Platform
  • 5+ years of hands-on experience with infrastructure design and deployment utilizing Cloud PaaS and IaaS cloud offerings
  • 5+ years of experience in cloud and system observability (Datadog, Grafana, Cloud Profiler) and alerting (OpsGenie, PagerDuty, GCP Cloud Monitoring)
  • 5+ years of experience architecting and building infrastructure with a focus on redundancy, reliability, disaster response and discovery
  • 5+ years of configuration/management experience with Cloud networking technologies (GCP IAM model,Terraform, gcloud-cli)
  • 5+ years of cloud Operations knowledge with automation solutions
  • 5+ years of cloud Solutions (Google Cloud Platform), Cloud Run, Containers, Terraform, GCS, C#, TypeScript

Company Offers:

  • Platinum PPO Healthcare + Vision & Dental (Henry covers 99% for employees and 50% for their qualified dependents).
  • 401(k) with matching contributions beginning your first day.
  • Unlimited PTO.
  • Fully remote position with occasional travel.
  • Impactful, rewarding work as part of a fast-growing brand helping thousands of people every day.

Equal Opportunity Statement:

Henry Meds is committed to promoting an inclusive work environment free of discrimination and harassment. We value a diverse and balanced team where everyone can belong.

Applicants must be authorized to work for ANY employer in the U.S. We cannot sponsor or take over sponsorship of an employment Visa at this time.

#LI-TS1



  • Remote, Oregon, United States Comcast Advertising Full time

    FreeWheel, a Comcast company, provides comprehensive ad platforms for publishers, advertisers, and media buyers. Powered by premium video content, robust data, and advanced technology, we're making it easier for buyers and sellers to transact across all screens, data types, and sales channels. As a global company, we have offices in nine countries and can...


  • Remote, Oregon, United States Abarca Full time

    What you'll doIn a few words...Abarca is igniting a revolution in healthcare. We built our company on the belief that with smarter technology we are redefining pharmacy benefits, but this is just the beginning...Our Site Reliability Engineering team leverages software engineering and infrastructure operations to create highly reliable and scalable software...


  • Remote, Oregon, United States Hypori Inc. Full time

    Hypori Inc, a leading provider of SaaS cybersecurity solutions, is transforming secure mobility for federal and commercial customers, including the United States Army. Hypori's secure virtual workspace enables users to access critical data and apps from any mobile device without compromising user privacy. From commercial IP to national security level intel,...


  • Remote, Oregon, United States Xero Full time

    Xero is a beautiful, easy-to-use platform that helps small businesses and their accounting and bookkeeping advisors grow and thrive. At Xero, our purpose is to make life better for people in small business, their advisors, and communities around the world. This purpose sits at the centre of everything we do. We support our people to do the best work of their...


  • Remote, Oregon, United States Business Wire Full time

    Business Wire, a Berkshire Hathaway company, is the global market leader in press release distribution and regulatory disclosure. We are on a mission to redefine how organizations connect with their audiences - and that's just the beginningOrganizations, large and small, depend on us to accurately publicize market-moving news and multimedia, and generate...


  • Remote, Oregon, United States Brooksource Full time

    Job DescriptionBrooksource is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our systems and applications.Key Responsibilities:Linux System Administration: Manage and optimize Linux-based systems and servers to...


  • Remote, Oregon, United States Neon Full time

    Neon is aiming to be the go-to platform if you need a serverless Postgres with additional features like branching and scaling, to name a couple. Currently we are serving 750k databases and we want to grow that number, along with delivering more features without compromising from reliability and scalability. This is where our SRE team comes into the...


  • Remote, Oregon, United States Own Company Full time

    Own is the leading data platform trusted by thousands of organizations to protect and activate SaaS data to transform their businesses. Own empowers customers to ensure the availability, security and compliance of mission-critical data, while unlocking new ways to gain deeper insights faster. By partnering with some of the world's largest SaaS ecosystems...


  • Remote, Oregon, United States DFIN Full time

    Donnelley Financial Solutions (DFIN) is a leader in risk and compliance solutions, providing insightful technology, industry expertise and data insights to clients across the globe. We're here to help you make smarter decisions with insightful technology, industry expertise and data insights at every stage of your business and investment lifecycles. As...


  • Remote, Oregon, United States Katmai Full time

    ABOUT KATMAIKatmai is pioneering the future of virtual experiences and hybrid work. The platform brings people together inside an easy-to-navigate 3D environment, enabling natural communication & collaboration, spontaneous interactions, and a sense of place that's been missing from the digital world. The simplicity of the user experience means no headsets...


  • Remote, Oregon, United States Sparksoft Corporation Full time

    Join us at Sparksoft, where we're not just another tech company—we're a catalyst for change. Our mission isn't just to offer IT solutions; it's to revolutionize the way you work. Here, passion isn't just a buzzword; it's the fuel behind groundbreaking ideas and transformative technologies. We serve a wide range of government clients, delivering impact...


  • Remote, Oregon, United States Tyk Full time

    DescriptionWho are Tyk, and what do we do?The Tyk API Management platform is helping to drive the connected world and power new products and services. We're changing the way that organisations connect any number of their systems and services. Whether internal, external, public or highly encrypted systems, Tyk helps businesses drive value across the retail,...


  • Remote, Oregon, United States Symbotic Full time

    Company OverviewSymbotic is at the forefront of transforming supply chain logistics through its advanced A.I.-driven robotic technology platform. Our intelligent software coordinates sophisticated robots within a high-density, comprehensive system, revolutionizing warehouse automation to enhance efficiency, speed, and adaptability.Position SummaryThe Lead...


  • Remote, Oregon, United States Dutchie Full time

    About DutchieFounded in 2017, Dutchie is a comprehensive technology platform powering dispensary operations, while providing consumers with safe and easy access to cannabis. Dutchie aims to further support the positive societal change the cannabis industry brings to the world through wellness benefits, social justice, and empowering local communities through...


  • Remote, Oregon, United States Symbotic Full time

    About UsSymbotic is at the forefront of transforming supply chain logistics through our advanced A.I.-driven robotic technology platform. Our intelligent software coordinates sophisticated robots within a comprehensive system, revolutionizing warehouse automation to enhance efficiency, speed, and adaptability. Position OverviewThe Lead Site Installation...

  • Engineering Lead

    1 month ago


    Remote, Oregon, United States Alloy Automation Full time

    Alloy Automation (YC W20) is more than just a tech startup - we're building the integration infrastructure that everyone from fast growing startups to Fortune 500's rely on to launch and manage their integrations – at scale. Our engineering team delivers a best in class, incredible experience for our customers who range from global brands like Burberry...


  • Remote, Oregon, United States Sargent & Lundy Full time

    Position Overview Sargent & Lundy's Government Services Division is at the forefront of engineering design and advisory services, providing essential support to management and operational contractors associated with U.S. Department of Energy (DOE) sites and national laboratories. Our focus includes aiding the DOE Environmental Management Directorate and...


  • Remote, Oregon, United States GE Full time

    Job Description SummaryThe I&C Systems Design Engineer is responsible for design and analysis of I&C systems for nuclear power plant applications.Job DescriptionResponsible for Plant I&C Systems design activities that support:GE's BWRX-300 Small Modular Reactor (SMR) and/or Gen-IV reactor technologies including Natrium and ARC sodium fast reactors...


  • Remote, Oregon, United States Symbotic Full time

    About the RoleWe are seeking a highly skilled Site Installation Manager to join our Implementation organization within Symbotic. This individual will be responsible for leading the installation of our automated equipment on customer sites, ensuring timely completion, within budget, and delivered without defect.Key ResponsibilitiesManage and lead...


  • Remote, Oregon, United States Sargent & Lundy Full time

    Position Overview Sargent & Lundy's Government Services Division is at the forefront of engineering design and consulting, providing essential support to the management and operational contractors for U.S. Department of Energy (DOE) sites and national laboratories. Our work is pivotal in aiding the DOE Environmental Management Directorate and the...