Senior Staff Site Reliability Engineer

3 weeks ago


San Francisco, California, United States WEX Full time

About the Role

The WEX Site Reliability Engineering team is seeking a technical leader to drive the design and implementation of complex systems at scale. As a Senior Staff SRE, you will work closely with engineering teams to ensure that our systems are reliable, performant, and secure.

Key Responsibilities

  • Provide technical guidance and mentorship to other SREs and engineers.
  • Lead the design and implementation of complex systems and solutions.
  • Drive the adoption of SRE best practices across the organization.
  • Architect and implement highly available, scalable, and fault-tolerant systems.
  • Optimize system performance and resource utilization.
  • Proactively identify and mitigate risks to system reliability.
  • Lead incident response efforts, driving efficient resolution and post-incident analysis.
  • Develop and implement processes to improve incident response capabilities.
  • Design and develop automation tools to streamline operational tasks, improve system reliability, and reduce toil.
  • Utilize monitoring and observability tools to gain deep insights into system behavior.
  • Work closely with development teams to ensure software design meets operational requirements.
  • Foster a culture of collaboration and knowledge sharing across teams.
  • Forecast future capacity needs and implement strategies to ensure systems scale efficiently.
  • Continuously identify performance bottlenecks and lead efforts to optimize system performance.
  • Champion security best practices and ensure that systems are designed and operated in compliance with industry standards and regulations.
  • Stay current with emerging technologies and industry trends.
  • Evaluate and introduce new tools and techniques to improve SRE practices and system reliability.

Requirements

  • 7+ years of hands-on experience as a Site Reliability Engineer or equivalent role.
  • 7+ years of development experience with at least one major programming language.
  • Expert-level knowledge of Cloud Computing platforms (AWS and Azure).
  • Proven ability to lead complex technical projects and initiatives.
  • Strong communication and collaboration skills, with the ability to influence and build consensus.
  • Deep understanding of observability, logging, and monitoring technologies.
  • Experience with a variety of RDBMS and NoSQL data stores.
  • Expertise in containerization technologies such as Docker and Kubernetes.
  • Expertise in infrastructure as code.
  • Experience designing and building RESTful APIs.
  • Extensive hands-on experience with (Datadog, Splunk, or other tooling).
  • Familiarity with Agile methodologies and practices.
  • Extensive experience in providing and leading critical application support in a 24/7/365 high-availability environment.
  • Experience with GitOps.
  • BA/BS degree in Computer Science or related technical field, or equivalent job experience.

This Senior Staff SRE role offers a unique opportunity to make a significant impact on the reliability and performance of WEX's critical Benefits systems. You will play a key role in shaping the future of SRE at WEX and driving innovation across the organization.



  • San Francisco, California, United States WEX Full time

    The WEX Site Reliability Engineering team is seeking a Senior Staff SRE who is passionate about developing software and solutions focused on observability, incident response, reliability, and performance.The team will be part of the Benefits Reliability organization which supports our internal stakeholders and our Benefits Platform teams.As part of the...


  • San Francisco, California, United States Tampa Gardens Senior Living Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our Cloud Infrastructure Team. As a key member of our team, you will be responsible for deploying, managing, optimizing, and upgrading the systems that run Sight Machine software.You will work closely with our Development Engineering team to ensure the stability,...


  • San Francisco, California, United States Astranis Full time

    Astranis MissionAstranis is revolutionizing global connectivity by developing the next generation of smaller, more cost-effective spacecraft. Our mission is to bridge the digital divide and connect the four billion people worldwide who lack internet access.Job SummaryWe are seeking a highly motivated and experienced Senior Site Reliability Engineer to join...


  • San Francisco, California, United States Crunchyroll Full time

    About CrunchyrollWe're a global entertainment company dedicated to delivering the art and culture of anime to a passionate community. Our mission is to help everyone belong, and we're looking for talented individuals to join our team.The RoleWe're seeking a Staff Site Reliability Engineer to maintain and enhance the reliability of our data infrastructure. As...


  • San Francisco, California, United States Aitopics Full time

    About the RoleWe are seeking a highly skilled Staff Site Reliability Engineer to join our Data Engineering team. As a key member of our team, you will be responsible for maintaining and enhancing the reliability of our data infrastructure.Your work will directly impact the availability and performance of our data services, enabling the organization to make...


  • San Francisco, California, United States Twitter Full time

    Job Summary:Twitter is seeking a Senior Site Reliability Engineer to lead a team of engineers working to keep our services reliable and scalable. The ideal candidate will have experience managing services in a distributed environment and be comfortable working with on-prem and cloud-based infrastructure.Responsibilities:Lead a team of site reliability...


  • San Francisco, California, United States WEX Full time

    Job SummaryThe WEX Site Reliability Engineering team is seeking a highly motivated and quick-learning individual to join our team as a Site Reliability Engineer Level 1. As a key member of our team, you will be responsible for ensuring the reliability, performance, and security of our systems.Key Responsibilities:Actively participate in training and...


  • San Francisco, California, United States Zilliz Full time

    Job Title: Cloud Platform Staff Site Reliability EngineerWe are seeking a highly skilled Cloud Platform Staff Site Reliability Engineer to join our team at Zilliz. As a key member of our SRE team, you will be responsible for ensuring the reliability, availability, and performance of our distributed database systems.Key Responsibilities:Design and build tools...


  • San Francisco, California, United States TBWA\Chiat\Day Full time

    Job Title:Senior Site Reliability Engineer with Perplexity AIJob Summary:We are seeking a highly skilled Senior Site Reliability Engineer to join our team at Perplexity AI. As a key member of our infrastructure team, you will be responsible for designing, implementing, and scaling our cloud infrastructure to support our AI-powered search...


  • San Francisco, California, United States Astranis Full time

    Astranis is a pioneering company that aims to bridge the digital divide by connecting people worldwide who lack internet access.We're building the next generation of smaller, more cost-effective spacecraft to bring the world online.As a team, we've made significant progress, launching two satellites into orbit, signing ten commercial deals worth over $1...


  • San Francisco, California, United States HashiCorp Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our Production Engineering team at HashiCorp. As a key member of our team, you will be responsible for ensuring the reliability, performance, and robustness of our Terraform Platform.Key Responsibilities:Dive into complex problems with a focus on both immediate remediation...


  • San Francisco, California, United States Unreal Gigs Full time

    Job Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...


  • San Francisco, California, United States Unreal Gigs Full time

    Job Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...


  • San Francisco, California, United States DaVita Full time

    About the RoleThe WEX Site Reliability Engineering team is seeking a skilled Site Reliability Engineer to join our Platform Reliability organization. As a key member of our team, you will be responsible for developing software and solutions focused on observability, incident response, reliability, and performance.You will collaborate with our engineering...


  • San Francisco, California, United States Crusoe Full time

    About Crusoe Energy SystemsCrusoe Energy Systems is a pioneering company that aims to unlock value in stranded energy resources through the power of computation. By co-locating mobile data centers with stranded energy resources, such as flare gas and underloaded renewables, Crusoe delivers low-cost, carbon-negative distributed computing solutions. Our...


  • San Francisco, California, United States Instabase Full time

    About InstabaseAt Instabase, we're passionate about harnessing the power of AI innovation to democratize access to cutting-edge technology and empower organizations to solve complex unstructured data problems. With a strong presence in the market and a talented team, we're committed to delivering top-tier solutions that drive business success.Job...


  • San Francisco, California, United States Instabase Full time

    About InstabaseInstabase is a global company with offices in San Francisco, New York, London, and Bengaluru. We're a people-first organization that values experimentation, curiosity, and customer obsession.Job SummaryWe're seeking a Site Reliability Engineer to join our Site Reliability and Platform Engineering team. As a key member of our team, you'll be...


  • San Francisco, California, United States Withorb Full time

    About UsOrb is a cutting-edge technology company on a mission to revolutionize the way businesses approach revenue growth. Our team is passionate about building a robust infrastructure that enables our customers to unlock their full potential.Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our...


  • San Francisco, California, United States Hinge Health Full time

    About the RoleHinge Health is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our platform, including automation, logging, monitoring, and alerting.You will thrive in a collaborative environment, have excellent communication skills, and be...


  • San Francisco, California, United States Outdefine Full time

    About the JobWe are seeking a highly skilled Site Reliability Engineer to join our team at Outdefine. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our ecommerce platform.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure using Kubernetes...