Site Reliability Engineer

4 weeks ago


San Francisco, California, United States Xai Full time
About xAI

xAI is a cutting-edge technology company that specializes in developing large-scale, highly-reliable distributed systems. Our team of software engineers is passionate about building high-quality software and tackling complex technical challenges.

The Role

We are seeking an experienced Site Reliability Engineer to join our dynamic team in London. As a key member of our team, you will be responsible for improving our observability, building reliable alerts, designing on-call rotations, and enhancing our deployment process.

Main Responsibilities
  • Design and implement monitoring solutions to enhance our observability
  • Develop and maintain reliable alerting systems
  • Design and oversee on-call rotations to ensure seamless system operation
  • Improve our deployment process to increase system reliability
Requirements
  • Expert knowledge of at least one programming language that compiles to machine code (Rust, C++, or Go)
  • Proficiency in monitoring technologies (Prometheus, Grafana, and PagerDuty)
  • Experience with deployment technologies (Pulumi or Terraform)
  • Expert knowledge of Kubernetes
Location and Benefits

The role is based in our London office, with opportunities for work-from-home days and semi-regular business trips to California. We offer competitive cash-based compensation, xAI equity, private health and dental insurance, and unlimited time off subject to prior approval.



  • San Francisco, California, United States Unreal Gigs Full time

    Job Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...


  • San Francisco, California, United States Unreal Gigs Full time

    Job Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...


  • San Francisco, California, United States DaVita Full time

    About the RoleThe WEX Site Reliability Engineering team is seeking a skilled Site Reliability Engineer to join our Platform Reliability organization. As a key member of our team, you will be responsible for developing software and solutions focused on observability, incident response, reliability, and performance.You will collaborate with our engineering...


  • San Francisco, California, United States Instabase Full time

    About InstabaseAt Instabase, we're passionate about harnessing the power of AI innovation to democratize access to cutting-edge technology and empower organizations to solve complex unstructured data problems. With a strong presence in the market and a talented team, we're committed to delivering top-tier solutions that drive business success.Job...


  • San Francisco, California, United States Instabase Full time

    About InstabaseInstabase is a global company with offices in San Francisco, New York, London, and Bengaluru. We're a people-first organization that values experimentation, curiosity, and customer obsession.Job SummaryWe're seeking a Site Reliability Engineer to join our Site Reliability and Platform Engineering team. As a key member of our team, you'll be...


  • San Francisco, California, United States Withorb Full time

    About UsOrb is a cutting-edge technology company on a mission to revolutionize the way businesses approach revenue growth. Our team is passionate about building a robust infrastructure that enables our customers to unlock their full potential.Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our...


  • San Francisco, California, United States Outdefine Full time

    About the JobWe are seeking a highly skilled Site Reliability Engineer to join our team at Outdefine. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our ecommerce platform.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure using Kubernetes...


  • San Francisco, California, United States Roman Health Pharmacy LLC Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Xero. As a key member of our Reliability Enablement team, you will play a critical role in ensuring the reliability and performance of our systems.Key ResponsibilitiesInvestigate operational surprises and support teams in post-incident activitiesConduct in-depth...


  • San Francisco, California, United States YO HR CONSULTANCY Full time

    Job Title: Site Reliability EngineerJob Description:At YO HR CONSULTANCY, we are seeking a highly skilled Site Reliability Engineer to join our team.Key Responsibilities:* Extensive experience working with Linux flavors like RHEL/CentOS OS, shells, filesystems, and utilities* Knowledge of distributed computing and experience working with container...


  • San Francisco, California, United States Orb Full time

    About the RoleOrb is seeking a skilled Site Reliability Engineer to join our team. As a key member of our engineering organization, you will play a critical role in maintaining and scaling our robust infrastructure, ensuring stability, scalability, and performance.You will be responsible for tackling complex engineering challenges, from scaling our data...


  • San Francisco, California, United States SpeedCast Full time

    Job Title: Site Reliability EngineerAt Speedcast, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our communication products.Key Responsibilities:Analyze and design continuous integration/continuous delivery...


  • San Francisco, California, United States Swish Analytics Full time

    {"h1": "Site Reliability Engineer at Swish Analytics"} Swish Analytics is a sports analytics and betting startup that's revolutionizing the industry with cutting-edge predictive data products. We're on a mission to make oddsmaking a challenge rooted in engineering, mathematics, and sports betting expertise, not intuition. We're looking for a team-oriented...


  • San Francisco, California, United States GRNET Full time

    GRNET is seeking a highly skilled Site Reliability Engineer to join its team. As an SRE, you will be responsible for designing and implementing fault-tolerant, scalable, and distributed services. You will work closely with the team to bring your technical opinion and vision to the table, handle problems that require under-the-hood investigation, and lead...


  • San Francisco, California, United States Perplexity Full time

    Site Reliability EngineerPerplexity is seeking a highly skilled Site Reliability Engineer to join our team in revolutionizing the way people interact with the internet. As a key member of our infrastructure team, you will be responsible for designing, implementing, and scaling the systems that support our web and mobile products.Key ResponsibilitiesDesign...


  • San Francisco, California, United States WEX Full time

    Job SummaryThe WEX Site Reliability Engineering team is seeking a highly motivated and quick-learning individual to join our team as a Site Reliability Engineer Level 1. As a key member of our team, you will be responsible for ensuring the reliability, performance, and security of our systems.Key Responsibilities:Actively participate in training and...


  • San Francisco, California, United States SpeedCast Full time

    Job Summary:Speedcast is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the reliability of our cloud-based infrastructure.Key Responsibilities:Analyze and design continuous integration/continuous delivery pipelines to ensure seamless...


  • San Francisco, California, United States Crunchyroll Full time

    About CrunchyrollWe're a global entertainment company dedicated to delivering the art and culture of anime to a passionate community. Our mission is to help everyone belong, and we're looking for talented individuals to join our team.The RoleWe're seeking a Staff Site Reliability Engineer to maintain and enhance the reliability of our data infrastructure. As...


  • San Francisco, California, United States https:www.energyjobline.comsitemap Full time

    About BestSecretGroupWe are a leading European members-only online destination for premium and luxury off-price fashion.Our tech-focused mindset and strong commitment to sustainability drive a unique experience for our members.With a rich history and a major tech transformation ahead, we are scaling at pace to become one of Europe's most exciting ecommerce...


  • San Francisco, California, United States Astranis Full time

    Astranis MissionAstranis is revolutionizing global connectivity by developing the next generation of smaller, more cost-effective spacecraft. Our mission is to bridge the digital divide and connect the four billion people worldwide who lack internet access.Job SummaryWe are seeking a highly motivated and experienced Senior Site Reliability Engineer to join...


  • San Francisco, California, United States Arbitrum Inc Full time

    Reliability EngineerAt Arbitrum Inc, we're on a mission to bring blockchain to a billion people. Our developer platform is designed to make building on the blockchain easy, and we're looking for a skilled Reliability Engineer to join our Infrastructure team.As a Reliability Engineer, you'll collaborate with our engineering team to design, deploy, and...