Senior Staff Site Reliability Engineer

3 days ago


San Francisco, California, United States WEX Full time

About the Role

The WEX Site Reliability Engineering team is seeking a technical leader to drive the design and implementation of complex systems at scale. As a Senior Staff SRE, you will work closely with engineering teams to ensure that our systems are reliable, performant, and secure.

Key Responsibilities

  • Provide technical guidance and mentorship to other SREs and engineers.
  • Lead the design and implementation of complex systems and solutions.
  • Drive the adoption of SRE best practices across the organization.
  • Architect and implement highly available, scalable, and fault-tolerant systems.
  • Optimize system performance and resource utilization.
  • Proactively identify and mitigate risks to system reliability.
  • Lead incident response efforts, driving efficient resolution and post-incident analysis.
  • Develop and implement processes to improve incident response capabilities.
  • Design and develop automation tools to streamline operational tasks, improve system reliability, and reduce toil.
  • Utilize monitoring and observability tools to gain deep insights into system behavior.
  • Work closely with development teams to ensure software design meets operational requirements.
  • Foster a culture of collaboration and knowledge sharing across teams.
  • Forecast future capacity needs and implement strategies to ensure systems scale efficiently.
  • Continuously identify performance bottlenecks and lead efforts to optimize system performance.
  • Champion security best practices and ensure that systems are designed and operated in compliance with industry standards and regulations.
  • Stay current with emerging technologies and industry trends.
  • Evaluate and introduce new tools and techniques to improve SRE practices and system reliability.

Requirements

  • 7+ years of hands-on experience as a Site Reliability Engineer or equivalent role.
  • 7+ years of development experience with at least one major programming language.
  • Expert-level knowledge of Cloud Computing platforms (AWS and Azure).
  • Proven ability to lead complex technical projects and initiatives.
  • Strong communication and collaboration skills, with the ability to influence and build consensus.
  • Deep understanding of observability, logging, and monitoring technologies.
  • Experience with a variety of RDBMS and NoSQL data stores.
  • Expertise in containerization technologies such as Docker and Kubernetes.
  • Expertise in infrastructure as code.
  • Experience designing and building RESTful APIs.
  • Extensive hands-on experience with (Datadog, Splunk, or other tooling).
  • Familiarity with Agile methodologies and practices.
  • Extensive experience in providing and leading critical application support in a 24/7/365 high-availability environment.
  • Experience with GitOps.
  • BA/BS degree in Computer Science or related technical field, or equivalent job experience.

This Senior Staff SRE role offers a unique opportunity to make a significant impact on the reliability and performance of WEX's critical Benefits systems. You will play a key role in shaping the future of SRE at WEX and driving innovation across the organization.



  • San Francisco, California, United States WEX Full time

    The WEX Site Reliability Engineering team is seeking a Senior Staff SRE who is passionate about developing software and solutions focused on observability, incident response, reliability, and performance.The team will be part of the Benefits Reliability organization which supports our internal stakeholders and our Benefits Platform teams.As part of the...


  • San Francisco, California, United States Infused Solutions Full time

    Senior Site Reliability EngineerInfused Solutions is seeking a highly skilled Senior Site Reliability Engineer to join their IT infrastructure team. Our client is a market leader in the San Francisco area, and we are looking for a talented individual with expertise in Microsoft Azure and a strong background in software engineering.Key Responsibilities:Design...


  • San Francisco, California, United States Tampa Gardens Senior Living Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our Cloud Infrastructure Team. As a key member of our team, you will be responsible for deploying, managing, optimizing, and upgrading the systems that run Sight Machine software.You will work closely with our Development Engineering team to ensure the stability,...


  • San Francisco, California, United States smartrecruiters - JobBoard Full time

    Job Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our engineering organization, you will be responsible for leading a team of site reliability engineers who work to keep Twitter reliable and scalable.Responsibilities:Lead a team of site reliability engineers to...


  • San Francisco, California, United States Outdefine Full time

    About the JobOutdefine is seeking a skilled Senior Site Reliability Engineer to join our team. As a key member of our Infrastructure team, you will be responsible for ensuring the reliability and scalability of our blockchain-based services.Key ResponsibilitiesRun internal Chainlink and Blockchain nodesProvide enterprise-level blockchain connectivity to...


  • San Francisco, California, United States Autodesk Full time

    {"Responsibilities": "As a Senior Site Reliability Engineer at Autodesk, you will be responsible for leading the development and maintenance of robust cloud infrastructure to support millions of daily users. You will automate processes to improve system reliability and introduce best practices in continuous integration and deployment. You will also lead...


  • San Francisco, California, United States SingleStore Full time

    Job Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at SingleStore. As a key member of our engineering team, you will be responsible for designing, building, and running elastic Kubernetes clusters across on-prem, AWS, Azure, and Google Cloud environments.Key Responsibilities:Help drive...


  • San Francisco, California, United States Infused Solutions Full time

    Job Title: Senior Site Reliability EngineerWe are seeking an experienced Senior Site Reliability Engineer to join our team at Infused Solutions. As a key member of our infrastructure team, you will be responsible for designing, implementing, and maintaining scalable, high-availability infrastructure for our platform.Key Responsibilities:Architect and manage...


  • San Francisco, California, United States Twitter Full time

    Job DescriptionAt Twitter, we're committed to delivering a seamless and reliable experience for our users. As a Senior Site Reliability Engineer, you'll play a critical role in ensuring the stability and scalability of our services.ResponsibilitiesLead a team of site reliability engineers to design, implement, and maintain scalable and reliable...


  • San Francisco, California, United States Astranis Full time

    Astranis MissionAstranis is revolutionizing global connectivity by developing the next generation of smaller, more cost-effective spacecraft. Our mission is to bridge the digital divide and connect the four billion people worldwide who lack internet access.Job SummaryWe are seeking a highly motivated and experienced Senior Site Reliability Engineer to join...


  • San Francisco, California, United States Crunchyroll Full time

    About CrunchyrollWe're a global entertainment company dedicated to delivering the art and culture of anime to a passionate community. Our mission is to help everyone belong, and we're looking for talented individuals to join our team.The RoleWe're seeking a Staff Site Reliability Engineer to maintain and enhance the reliability of our data infrastructure. As...


  • San Francisco, California, United States Gusto Full time

    About GustoGusto is a modern, online people platform that empowers small businesses to take care of their teams. Our comprehensive suite of tools includes full-service payroll, health insurance, 401(k)s, expert HR, and team management solutions. With offices in Denver, San Francisco, and New York, we serve over 300,000 businesses nationwide.Our MissionWe...


  • San Francisco, California, United States RevenueCat Full time

    About RevenueCat:RevenueCat is a mission-driven, remote-first company that is building the standard for mobile subscription infrastructure. We're a close-knit, product-driven team that strives to live our core values: Customer Obsession, Always Be Shipping, Own It, and Balance.We're looking for a Senior Site Reliability Engineer to help design, build, and...


  • San Francisco, California, United States Twitter Full time

    Job DescriptionAt Twitter, we're committed to delivering a seamless and reliable experience for our users. As a Senior Site Reliability Engineer, you'll play a critical role in ensuring the stability and scalability of our infrastructure.ResponsibilitiesLead a team of site reliability engineers to design, implement, and maintain scalable and reliable...


  • San Francisco, California, United States Crunchyroll Full time

    About CrunchyrollWe're a global entertainment company dedicated to delivering the art and culture of anime to a passionate community. Our mission is to help everyone belong, and we're committed to creating a workplace that reflects this value.The RoleWe're seeking a highly skilled Staff Site Reliability Engineer to join our Data Engineering team. As a key...


  • San Francisco, California, United States Astranis Full time

    Astranis MissionAstranis is revolutionizing global connectivity by building smaller, more cost-effective spacecraft to bridge the digital divide.Job SummaryWe're seeking a highly skilled Senior Site Reliability Engineer to join our team and lead our DevOps efforts as we expand to a fleet of satellites and their supporting services.Key ResponsibilitiesOwn and...


  • San Francisco, California, United States Rootly Full time

    About RootlyRootly is a fast-growing venture-backed startup on a mission to be the go-to way companies respond when things go wrong, helping every organization be more reliable.Job DescriptionWe are seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our Engineering team, you will be responsible for ensuring the...


  • San Francisco, California, United States Outdefine Full time

    About the RoleWe are seeking a skilled Senior Site Reliability Engineer to join our team at Outdefine. As a key member of our engineering team, you will be responsible for ensuring the reliability and scalability of our blockchain-based infrastructure.Key ResponsibilitiesDesign and implement scalable and reliable infrastructure solutions for our...


  • San Francisco, California, United States Webflow Full time

    About the RoleWe're seeking a highly skilled Senior Site Reliability Engineer to join our team at Webflow. As a key member of our Engineering organization, you'll play a critical role in ensuring the reliability and stability of our customer-facing, production infrastructure.With millions of users worldwide, our platform is used by over 2 million users...


  • San Francisco, California, United States Aitopics Full time

    About the RoleWe are seeking a highly skilled Staff Site Reliability Engineer to join our Data Engineering team. As a key member of our team, you will be responsible for maintaining and enhancing the reliability of our data infrastructure.Your work will directly impact the availability and performance of our data services, enabling the organization to make...