Cloud Infrastructure Reliability Engineer

2 days ago


San Francisco, California, United States Crusoe Full time

About Crusoe

Crusoe is a pioneering company in the field of AI-first cloud infrastructure. Our mission is to align the future of computing with the future of the climate. We're redefining AI cloud infrastructure and recognized as the 'gold standard' for reliability and performance.

About the Role

We're seeking an experienced SRE Manager to lead our 24/7 Site Reliability Engineering team. As the primary goal, you'll ensure continuous availability and optimal performance of our cloud infrastructure, providing customers with uninterrupted access to their GPUs.

Your Responsibilities

  • Design and implement advanced alerting and monitoring systems
  • Manage incident response and drive system improvements
  • Prioritize projects and streamline workflows to achieve rapid results
  • Collaborate with remote teams across time zones

Requirements

  • At least 3 years of experience with building and managing a 24/7 technical support team in a cloud operations environment
  • Strong background in Linux, containerization technologies, and Kubernetes
  • Experience with Prometheus, Victoria Metrics, exporters, against bare-metal endpoints
  • Some experience with Infrastructure as it relates to Data Center Operations
  • Leadership & Communication: Demonstrated leadership ability and excellent communication skills
  • Problem-Solving & Adaptability: Robust problem-solving skills and adaptability in a fast-paced environment

Benefits

  • Hybrid work schedule
  • Competitive Paid Time Off
  • Industry competitive pay ($120,000 - $180,000)
  • Retirement benefits
  • Healthcare benefits including Medical, Dental, and Vision


  • San Francisco, California, United States Federal Reserve Bank Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our dynamic team at the Federal Reserve Bank of San Francisco. As a key member of our National Integration Services team, you will play a critical role in designing and deploying cloud-based infrastructure solutions that meet the needs of our mission-critical systems.Your primary...


  • San Francisco, California, United States Federal Reserve Bank of San Francisco Full time

    We are the Federal Reserve Bank of San Francisco, a public servant with a mission to advance the nation's monetary, financial, and payment systems.The position of Sr./Lead Site Reliability Engineer at the Federal Reserve Bank of San Francisco involves working closely with Cash Application Delivery Services (ADS) development, QA, DevOps, and National IT...


  • San Francisco, California, United States Crusoe Energy Inc Full time

    About Crusoe Energy Inc.Crusoe Energy Inc. is a pioneering technology company dedicated to unlocking the value of stranded energy resources through innovative computation solutions.We aim to harmonize the long-term interests of the climate with the future of global computing infrastructure. As data centers consume an exponentially growing power footprint to...


  • San Francisco, California, United States Parafin Inc Full time

    About Parafin IncParafin Inc is a cutting-edge technology company that empowers small businesses by providing easy access to financial services through our innovative infrastructure platform. We are backed by prominent venture capitalists and have raised over $94M in equity and $200M in debt.We're seeking an experienced software engineer to join our...


  • San Francisco, California, United States ESL FACEIT Group Full time

    Job DescriptionWe are seeking a talented Cloud Infrastructure Engineer to join our team at ESL FACEIT Group. As a key member of our infrastructure team, you will be responsible for designing, analyzing, and troubleshooting large-scale distributed systems.ResponsibilitiesMaintain and improve monitoring and observability tools, ensuring seamless performance...


  • San Francisco, California, United States Ellation, Inc. Full time

    We're seeking a highly skilled Staff Site Reliability Engineer to join our Data Engineering team at Ellation, Inc. This role is ideal for individuals with a strong background in site reliability engineering and a passion for ensuring the reliability, scalability, and performance of our data infrastructure.About the RoleThis position will be responsible for...


  • San Francisco, California, United States VamosVentures Full time

    Transforming Data Storage for a Scalable FutureVamosVentures is seeking an exceptional Cloud Infrastructure Database Engineer to join our innovative team. In this role, you will have the opportunity to design and develop robust data storage systems that power our AI-powered spend platform.As a Cloud Infrastructure Database Engineer at VamosVentures, you will...


  • San Francisco, California, United States Philo Full time

    Job SummaryWe are seeking an experienced Senior Cloud Infrastructure Engineer to join our team at Philo. As a key member of our infrastructure team, you will be responsible for designing, building, and maintaining cloud-based systems that support our streaming platform.About PhiloPhilo is a leading provider of live and on-demand streaming services. Our...


  • San Francisco, California, United States Crusoe Full time

    Senior Cloud Software Engineer Role at CrusoeCrusoe is pioneering the future of AI-first cloud infrastructure, with a mission to align computing with sustainable climate goals. Our company has established itself as a leading provider of trusted, reliable AI platform solutions for Fortune 500 companies.About the CompanyWe redefining AI cloud infrastructure...


  • San Francisco, California, United States Crusoe Energy Inc Full time

    About CrusoeCrusoe Energy Inc. is pioneering vertically integrated, purpose-built AI infrastructure solutions trusted by Fortune 500 companies to power their most advanced AI applications.We're redefining AI cloud infrastructure with a mission to align the future of computing with the future of the climate. Our AI platform is recognized as the "gold...


  • San Francisco, California, United States Amplitude Full time

    Amplitude is a leading digital analytics platform that helps companies unlock the power of their products. With over 3,500 customers, including Atlassian, Jersey Mike's, NBCUniversal, Shopify, and Under Armour, our platform provides self-service visibility into the entire customer journey.We approach challenges with humility, take ownership of our...


  • San Francisco, California, United States ZipRecruiter Full time

    We are seeking a highly skilled Cloud Infrastructure Engineer to join our team at ZipRecruiter. As a key member of our engineering team, you will be responsible for designing and implementing scalable, secure, and cost-effective cloud architectures using services from major cloud providers (AWS, GCP, Azure).You will ensure that cloud infrastructure aligns...


  • San Francisco, California, United States Tbwa ChiatDay Inc Full time

    About PostmanPostman is a leading technology company that simplifies and streamlines the development and testing of APIs. Our innovative team in the Bay Area is seeking a talented Cloud Engineer to join our collaborative and dynamic environment.Estimated Salary: $205,000 - $255,000+Job Description:We are looking for an experienced Cloud Engineer to design,...


  • San Francisco, California, United States AirTree Ventures Pty Full time

    We are seeking a highly skilled Cloud Infrastructure Architect to join our team at AirTree Ventures Pty. As a key member of our engineering team, you will play a critical role in designing and maintaining our cloud infrastructure ecosystem.Key Responsibilities:Design, build, and maintain cloud infrastructure using infrastructure-as-code...


  • San Francisco, California, United States Unreal Gigs Full time

    Unreal Gigs is a forward-thinking company seeking a highly skilled Director of Cloud Infrastructure to lead our cloud strategy and infrastructure efforts. With a strong focus on innovation, scalability, and cost-effectiveness, this role presents an exciting opportunity for a seasoned professional to shape the backbone of our digital products and...


  • San Francisco, California, United States Anagh Technology Full time

    Job Overview: We are seeking a Cloud Infrastructure Engineer to join our team at Anagh Technology in Dallas, TX or Sunnyvale, CA. The ideal candidate will have extensive first-hand experience with cloud-based infrastructure engineering and a strong background in AWS architecture.About the Role:The successful candidate will be responsible for designing,...


  • San Francisco, California, United States CV Library Full time

    Overview:We are seeking a highly skilled Cloud Infrastructure Engineer to join our growing SRE team at Atlassian. The successful candidate will be responsible for scaling Cloud services, owning Caching infrastructure, tooling and automation that supports Atlassian's suite of Cloud products.Responsibilities:Analyzing and improving services and processes to...


  • San Francisco, California, United States Tbwa ChiatDay Inc Full time

    At Tbwa Chiat/Day Inc, we're seeking a skilled Cloud Infrastructure Architect to join our innovative team in the Bay Area. If you have a passion for cloud technologies, particularly Kubernetes, ArgoCD, Helm, and Crossplane, and thrive in a collaborative environment, we invite you to be part of our journey.Job OverviewWe are looking for a talented Cloud...


  • San Francisco, California, United States Crusoe Energy Inc Full time

    Crusoe Energy Inc is revolutionizing the way companies approach AI-first cloud infrastructure. As a pioneer in vertically integrated, purpose-built AI infrastructure solutions, we're trusted by Fortune 500 companies to power their most advanced AI applications.Our mission at Crusoe is to align the future of computing with the future of the climate. Our AI...


  • San Francisco, California, United States Amplitude Full time

    About AmplitudeAmplitude is a leading digital analytics platform that empowers businesses to unlock the full potential of their products. With over 3,200 customers worldwide, including industry leaders such as Atlassian, NBCUniversal, and Under Armour, Amplitude helps organizations make data-driven decisions to drive growth.Job SummaryWe are seeking an...