Site Reliability Engineer

3 days ago


San Francisco, California, United States Roman Health Pharmacy LLC Full time
About the Role

We are seeking a highly skilled Site Reliability Engineer to join our team at Xero. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based platform.

Key Responsibilities
  • Investigate operational surprises and support teams in post-incident activities
  • Conduct in-depth incident analysis and maximize post-incident learning across the organization
  • Complete short-term reliability consultancy and enablement engagements, such as SLO reviews and facilitating pre-mortems
  • Improve on-call health, uplift observability, and address operational hotspots when embedded with a product engineering portfolio
  • Identify, plan, and lead implementation of reliability uplift work and initiatives
  • Support delivery of strategic features and initiatives with reliability and distributed systems expertise
  • Observe and improve rituals and practices relating to production operations, incident response, and incident learning
Requirements
  • Solid experience in logging, monitoring, and observability of a highly distributed system
  • Leading incident management and response, including critical, complex, and high-severity incidents
  • Post-incident reviews, incident analysis, and learning from incidents
  • Experience working in a tech or product company with comparable scale and complexity
  • Systems thinking and understanding of how systems and components interact and respond to failure
  • Proficiency in one or more object-oriented programming languages or experience with infrastructure-as-code
Preferred Qualifications
  • Experience working with cloud providers such as AWS, Azure, or GCP
  • Experience with designing, developing, and operating distributed systems and large-scale software systems
  • Strong experience delivering technical initiatives in an operational, site reliability, or platform engineering capacity
  • The ability to solve engineering challenges outside of your own team, using influence rather than authority to enact change
  • Demonstrated experience in reliability concepts, such as capacity management, autoscaling, deployment and release safety, software strategies for reliability, fault tolerance, and graceful failure
  • Experienced in implementing customer-focused Service Level Objectives (SLOs)
  • Experience using software engineering to solve operational and reliability challenges
  • Understanding of human factors, safety science, and resilience engineering
  • Experience working in environments with advanced security and networks
What We Offer

Xero offers a competitive salary range of $170,000 - $195,000 per year, as well as a range of benefits, including generous paid leave, dedicated paid leave for physical and mental wellbeing, an Employee Assistance Program, employee resource groups, wellbeing programming and allowances, medical, dental, vision, and disability insurance, fertility and family forming financial support, 401k contribution matching, 26 weeks of paid parental leave for primary caregivers, an Employee Share Plan, and beautiful offices with snacks and break areas.



  • San Francisco, California, United States Unreal Gigs Full time

    Job Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...


  • San Francisco, California, United States Resource Informatics Group Full time

    Job Title:Site Reliability EngineerJob Summary:We are seeking a highly skilled Site Reliability Engineer to join our team at Resource Informatics Group. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our large-scale Oracle database systems.Key Responsibilities:Administer and troubleshoot...


  • San Francisco, California, United States Unreal Gigs Full time

    Job Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...


  • San Francisco, California, United States Wasmer Full time

    About the RoleWe are seeking an exceptional Site Reliability Engineer to join our team at Wasmer. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining scalable and reliable infrastructure solutions for our Edge computing platform.Key ResponsibilitiesDesign and implement scalable and reliable infrastructure...


  • San Francisco, California, United States Instabase Full time

    About InstabaseInstabase is a cutting-edge AI innovation company that empowers organizations to solve complex unstructured data problems. With a global presence and a customer-centric approach, we deliver top-tier solutions that provide unmatched advantages for everyday business operations.Job Title: Site Reliability EngineerWe are seeking a highly skilled...


  • San Francisco, California, United States Instabase Full time

    About InstabaseAt Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry.With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index...


  • San Francisco, California, United States Apollo Solutions Full time

    Site Reliability EngineerApollo Solutions has partnered with a pioneering artificial intelligence business that is revolutionizing the use of AI/ML in gaming and security.The company is working closely with government contracts and gaming console companies and is seeking a Site Reliability Engineer to join their growing team.The Site Reliability Engineer...


  • San Francisco, California, United States Perplexity AI Full time

    Site Reliability EngineerPerplexity AI is seeking a skilled Site Reliability Engineer to join our team and contribute to the development of our cutting-edge conversational answer engine.As a Site Reliability Engineer, you will be responsible for designing, implementing, and scaling the infrastructure and systems that support our web and mobile products.Key...


  • San Francisco, California, United States iTCO Solutions Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at iTCO Solutions. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and security of our cloud-based infrastructure.Key Responsibilities:Design and implement operational and infrastructural...


  • San Francisco, California, United States DaVita Full time

    About the RoleThe WEX Site Reliability Engineering team is seeking a skilled Site Reliability Engineer to join our Platform Reliability organization. As a key member of our team, you will be responsible for developing software and solutions focused on observability, incident response, reliability, and performance.You will collaborate with our engineering...


  • San Francisco, California, United States Instabase Full time

    About InstabaseAt Instabase, we're passionate about harnessing the power of AI innovation to democratize access to cutting-edge technology and empower organizations to solve complex unstructured data problems. With a strong presence in the market and a talented team, we're committed to delivering top-tier solutions that drive business success.Job...


  • San Francisco, California, United States Wasmer Full time

    About the RoleWe are seeking an exceptional Site Reliability Engineer to join our team at Wasmer. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our Edge computing platform.Key ResponsibilitiesDesign, implement, and maintain scalable and reliable infrastructure solutions for our Edge computing...


  • San Francisco, California, United States SpeedCast Full time

    Job Title: Site Reliability EngineerAt Speedcast, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based communication solutions.Key Responsibilities:Analyze and design continuous...


  • San Francisco, California, United States Instabase Full time

    About InstabaseInstabase is a cutting-edge AI innovation company that empowers organizations to solve complex unstructured data problems. With a global presence and a customer-centric approach, we deliver top-tier solutions that provide unmatched advantages for everyday business operations.Job DescriptionWe are seeking a highly skilled Site Reliability...


  • San Francisco, California, United States Instabase Full time

    About InstabaseAt Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry.With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index...


  • San Francisco, California, United States Instabase Full time

    About InstabaseInstabase is a global company with offices in San Francisco, New York, London, and Bengaluru. We're a people-first organization that values experimentation, curiosity, and customer obsession.Job SummaryWe're seeking a Site Reliability Engineer to join our Site Reliability and Platform Engineering team. As a key member of our team, you'll be...


  • San Francisco, California, United States Orb Full time

    About OrbOrb is a cutting-edge billing infrastructure company that empowers businesses to unlock their revenue potential. We believe that pricing and billing should not be a barrier to innovation and growth.Role & ImpactAs a Site Reliability Engineer at Orb, you will play a critical role in maintaining and scaling our robust infrastructure, ensuring...


  • San Francisco, California, United States Withorb Full time

    About UsOrb is a cutting-edge technology company on a mission to revolutionize the way businesses approach revenue growth. Our team is passionate about building a robust infrastructure that enables our customers to unlock their full potential.Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our...


  • San Francisco, California, United States BaseTen Labs, Inc. Full time

    About BaseTen Labs, Inc.We're a rapidly growing team of innovators backed by top-tier investors, including IVP, Spark Capital, and Sarah Guo at Conviction. Our mission is to empower machine learning teams at enterprises and AI-native companies to build scalable, reliable, and efficient infrastructure.Job DescriptionWe're seeking a skilled Site Reliability...


  • San Francisco, California, United States Outdefine Full time

    About the JobWe are seeking a highly skilled Site Reliability Engineer to join our team at Outdefine. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our ecommerce platform.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure using Kubernetes...