Site Reliability Engineer

2 weeks ago


San Francisco, California, United States AEG Full time
About the Role

We are seeking an experienced Site Reliability Engineer to join our DevSecOps and Infrastructure team. The successful candidate will be responsible for supporting our enterprise infrastructure, optimizing incident response, and working with technical teams to improve overall workload resiliency.

Responsibilities
  • Support production systems and help triage issues during live sporting events
  • Monitor the system and respond to incidents to maintain system SLO/SLA, review and follow up production incidents
  • Write and review code, develop documentation, and debug problems, live, on complex distributed systems
  • Optimize and facilitate incident response, conduct root cause analysis and blameless retrospectives
  • Work closely with technical teams to implement, optimize, maintain, scale and debug workloads on Kubernetes using CI/CD, automation tools and scripting languages to deliver tools/software to improve the reliability and scalability of services
Qualifications
  • 3+ years of experience working in an SRE leaning DevOps or full SRE roles
  • 3+ years building CICD pipelines with Github Actions, Gitlab CICD, or similar
  • Extensive experience with Kubernetes
  • Experience in managing customer-facing systems in a 24/7 environment including escalations
  • Experience triaging and escalation policies/protocols
  • Strong communication and documentation skills
  • Comfortable with scripting languages like Bash, Python, or similar
Preferred Qualifications
  • Networking and routing experience
  • Terraform in AWS to support global-scale services
  • Improving observability in an engineering organization
  • Past experience with PagerDuty or similar tools


  • San Francisco, California, United States Unreal Gigs Full time

    Job Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...


  • San Francisco, California, United States Unreal Gigs Full time

    Job Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...


  • San Francisco, California, United States Instabase Full time

    About InstabaseAt Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry.With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index...


  • San Francisco, California, United States DaVita Full time

    About the RoleThe WEX Site Reliability Engineering team is seeking a skilled Site Reliability Engineer to join our Platform Reliability organization. As a key member of our team, you will be responsible for developing software and solutions focused on observability, incident response, reliability, and performance.You will collaborate with our engineering...


  • San Francisco, California, United States Roman Health Pharmacy LLC Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Xero. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud-based platform.Key ResponsibilitiesInvestigate operational surprises and support teams in post-incident activitiesConduct in-depth incident...


  • San Francisco, California, United States Instabase Full time

    About InstabaseAt Instabase, we're passionate about harnessing the power of AI innovation to democratize access to cutting-edge technology and empower organizations to solve complex unstructured data problems. With a strong presence in the market and a talented team, we're committed to delivering top-tier solutions that drive business success.Job...


  • San Francisco, California, United States Wasmer Full time

    About the RoleWe are seeking an exceptional Site Reliability Engineer to join our team at Wasmer. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our Edge computing platform.Key ResponsibilitiesDesign, implement, and maintain scalable and reliable infrastructure solutions for our Edge computing...


  • San Francisco, California, United States SpeedCast Full time

    Job Title: Site Reliability EngineerAt Speedcast, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based communication solutions.Key Responsibilities:Analyze and design continuous...


  • San Francisco, California, United States Instabase Full time

    About InstabaseInstabase is a global company with offices in San Francisco, New York, London, and Bengaluru. We're a people-first organization that values experimentation, curiosity, and customer obsession.Job SummaryWe're seeking a Site Reliability Engineer to join our Site Reliability and Platform Engineering team. As a key member of our team, you'll be...


  • San Francisco, California, United States Withorb Full time

    About UsOrb is a cutting-edge technology company on a mission to revolutionize the way businesses approach revenue growth. Our team is passionate about building a robust infrastructure that enables our customers to unlock their full potential.Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our...


  • San Francisco, California, United States BaseTen Labs, Inc. Full time

    About BaseTen Labs, Inc.We're a rapidly growing team of innovators backed by top-tier investors, including IVP, Spark Capital, and Sarah Guo at Conviction. Our mission is to empower machine learning teams at enterprises and AI-native companies to build scalable, reliable, and efficient infrastructure.Job DescriptionWe're seeking a skilled Site Reliability...


  • San Francisco, California, United States Outdefine Full time

    About the JobWe are seeking a highly skilled Site Reliability Engineer to join our team at Outdefine. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our ecommerce platform.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure using Kubernetes...


  • San Francisco, California, United States Roman Health Pharmacy LLC Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Xero. As a key member of our Reliability Enablement team, you will play a critical role in ensuring the reliability and performance of our systems.Key ResponsibilitiesInvestigate operational surprises and support teams in post-incident activitiesConduct in-depth...


  • San Francisco, California, United States YO HR CONSULTANCY Full time

    Job Title: Site Reliability EngineerJob Description:At YO HR CONSULTANCY, we are seeking a highly skilled Site Reliability Engineer to join our team.Key Responsibilities:* Extensive experience working with Linux flavors like RHEL/CentOS OS, shells, filesystems, and utilities* Knowledge of distributed computing and experience working with container...


  • San Francisco, California, United States Orb Full time

    About the RoleOrb is seeking a skilled Site Reliability Engineer to join our team. As a key member of our engineering organization, you will play a critical role in maintaining and scaling our robust infrastructure, ensuring stability, scalability, and performance.You will be responsible for tackling complex engineering challenges, from scaling our data...


  • San Francisco, California, United States SpeedCast Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at Speedcast. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our communication products.Key Responsibilities:Analyze and design continuous integration/continuous delivery...


  • San Francisco, California, United States SpeedCast Full time

    Job Title: Site Reliability EngineerAt Speedcast, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our communication products.Key Responsibilities:Analyze and design continuous integration/continuous delivery...


  • San Francisco, California, United States Swish Analytics Full time

    {"h1": "Site Reliability Engineer at Swish Analytics"} Swish Analytics is a sports analytics and betting startup that's revolutionizing the industry with cutting-edge predictive data products. We're on a mission to make oddsmaking a challenge rooted in engineering, mathematics, and sports betting expertise, not intuition. We're looking for a team-oriented...


  • San Francisco, California, United States GRNET Full time

    GRNET is seeking a highly skilled Site Reliability Engineer to join its team. As an SRE, you will be responsible for designing and implementing fault-tolerant, scalable, and distributed services. You will work closely with the team to bring your technical opinion and vision to the table, handle problems that require under-the-hood investigation, and lead...


  • San Francisco, California, United States smartrecruiters - JobBoard Full time

    Job Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our engineering organization, you will be responsible for leading a team of site reliability engineers who work to keep Twitter reliable and scalable.Responsibilities:Lead a team of site reliability engineers to...