Senior Cloud Reliability Engineer

3 weeks ago

San Francisco, California, United States Crusoe Full time

About Crusoe Energy Systems:

We are a company on a mission to unlock value in stranded energy resources through the power of computation.

Our goal is to align the long-term interests of the climate with the future of global computing infrastructure.

Data centers consume an exponentially growing power footprint to deliver technology to all connected devices, and we aim to ensure that the energy meeting that demand is sourced in an environmentally responsible fashion.

We co-locate mobile data centers with stranded energy resources, like flare gas and underloaded renewables, to deliver low-cost, carbon-negative distributed computing solutions.

Our managed cloud services platform, Crusoe Cloud, is powered by stranded energy and enables climate-friendly innovation in computationally intensive fields, including artificial intelligence, graphics rendering, and computational biology.

About This Role:

Our Site Reliability Engineering (SRE) team plays a pivotal role in ensuring the reliability and performance of our infrastructure.

SRE at Crusoe is dedicated to detecting, analyzing, and preventing issues to maintain high Service Level Agreement through Service Level Indicators (SLIs) and Service Level Objectives (SLOs).

Through automation and proactive remediation, our SREs not only resolve common errors automatically but also advise various engineering teams in building resilient code.

We prioritize anticipating and resolving issues before they impact our customers, conducting thorough post-mortems, and driving continuous improvement.

Our customer-centric approach ensures that clients always have access to the virtual machines they depend on.

A Day in the Life:

As a Site Reliability Engineer at Crusoe Energy Systems, your day begins with a review of overnight alerts and system performance metrics to ensure everything is running smoothly.

You will collaborate with your team in a morning stand-up meeting to discuss ongoing projects, recent incidents, and priorities for the day.

Your tasks might include automating routine processes, analyzing system logs, and developing tools to enhance our monitoring capabilities.

You'll spend part of your day working closely with software engineers, advising on best practices for resilient code and reviewing changes before deployment.

Regularly, you will engage in incident response drills, post-mortems, and root cause analysis sessions to learn from past issues and prevent future ones.

Throughout the day, you will stay focused on maintaining high SLIs and SLOs, ensuring that our infrastructure remains robust and reliable for our customers.

By day's end, you will document your work, share insights with your team, and plan for the next day's challenges, always with a customer-centric mindset.

Requirements:

5+ years of professional SRE experience
5+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems
Bachelor's Degree in Computer Science or related field, or 8+ years relevant work experience
Solid understanding of infrastructure design, including the operational trade-offs of various designs
Experience writing high-quality code with at least one programming language (Python, Go, or similar)
Experience building with modern infrastructure tools such as Docker, Kubernetes, Ansible, Cloud Formation, Terraform
Experience building with modern CI/CD practices and build systems, such as GitLab CI/CD, CircleCI, GitHub Actions
Experience with logging, monitoring, and alerting systems and tools
Experience with Unix/Linux environments
Experience with TCP/IP and network programming
Experience with information security best practices
Excellent communication skills
Must be able to pass a background check
Embody the Company values

Benefits:

Hybrid work schedule
Industry-competitive pay
Restricted Stock Units in a fast-growing, well-funded technology company
Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
Employer contributions to HSA accounts
Paid Parental Leave
Paid life insurance, short-term, and long-term disability
Teladoc
401(k) with a 100% match up to 4% of salary
Generous paid time off and holiday schedule
Cell phone reimbursement
Tuition reimbursement
Subscription to the Calm app
MetLife Legal
Company-paid commuter benefit; $50 per pay period

Compensation Range:

Compensation will be paid in the range of $183,000 - $250,000. Restricted Stock Units are included in all offers.

Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe Energy is an Equal Opportunity Employer.

Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Senior Cloud Reliability Engineer

3 weeks ago

San Francisco, California, United States Cribl, Inc Full time

Cribl Inc is seeking a Senior Cloud Reliability Engineer to join our mission to unlock the value of all observability data.Cribl provides users a new level of observability, intelligence and control over their real-time data.You will join a team of technical engineers who are committed to shipping only high-quality software and enjoying all the goat gifs the...
Senior Cloud Reliability Engineer

4 weeks ago

San Francisco, California, United States Atlassian Full time

Overview:We are seeking a highly skilled Senior Cloud Reliability Engineer to join our growing SRE team at Atlassian. As a key member of our team, you will be responsible for designing, implementing, and maintaining scalable and reliable cloud infrastructure that supports our suite of cloud products.The ideal candidate will have a strong background in cloud...
Senior Cloud Reliability Engineer

4 weeks ago

San Francisco, California, United States Varo Bank Full time

Varo Bank's cloud infrastructure is a complex system that requires a high level of reliability and availability. As a Senior Cloud Reliability Engineer, you will be responsible for designing and maintaining disaster recovery scenarios, ensuring that our systems are always up and running.We are looking for a skilled engineer who can write and maintain...
Senior Cloud Reliability Engineer

4 weeks ago

San Francisco, California, United States Crusoe Energy Inc Full time

About Crusoe Energy IncCrusoe Energy Inc is a pioneering company that is revolutionizing the way we approach energy resources. Our mission is to unlock value in stranded energy resources through the power of computation.Job SummaryWe are seeking a highly skilled Senior/Staff Site Reliability Engineer to join our team. As a key member of our engineering team,...
Senior Site Reliability Engineer

4 weeks ago

San Francisco, California, United States Tampa Gardens Senior Living Full time

About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our Cloud Infrastructure Team. As a key member of our team, you will be responsible for deploying, managing, optimizing, and upgrading the systems that run Sight Machine software.You will work closely with our Development Engineering team to ensure the stability,...
Senior Cloud Reliability Engineer

4 weeks ago

San Francisco, California, United States Crusoe Full time

About Crusoe EnergyCrusoe Energy is a pioneering company that aims to unlock value in stranded energy resources through the power of computation. Our mission is to align the long-term interests of the climate with the future of global computing infrastructure.Job DescriptionWe are seeking a highly skilled Senior/Staff Site Reliability Engineer to join our...
$TBWA\Chiat\Day$

Senior Cloud Engineer

4 weeks ago

San Francisco, California, United States TBWA\Chiat\Day Full time

About Scout MotorsScout Motors is a pioneering company that is revolutionizing the electric pick-up truck and rugged SUV marketplace. We're a team of innovators, entrepreneurs, and visionaries who are passionate about shaping the future of transportation.Job SummaryWe're seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key...
Senior Cloud Engineer

3 weeks ago

San Francisco, California, United States Eateam Full time

Role:As a key member of Eateam's infrastructure team, we are seeking a highly skilled Senior Cloud Engineer to lead our cloud platform engineering efforts.Responsibilities:Design and deploy virtualization architectures, including VMware, Openshift, or KubeVirt platforms.Evaluate existing application architectures and identify opportunities for...
Senior Cloud Reliability Engineer

4 weeks ago

San Francisco, California, United States Crusoe Full time

About Crusoe Energy SystemsCrusoe Energy Systems is a pioneering company that's revolutionizing the way we approach energy resources. Our mission is to unlock value in stranded energy resources through the power of computation.We're driven by a vision to align the long-term interests of the climate with the future of global computing infrastructure. As data...
Senior Cloud Reliability Engineer

4 weeks ago

San Jose, California, United States Tik Tok Full time

Senior Site Reliability Engineer, Global E-commerceTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy to our users. As a Senior Site Reliability Engineer on our Global E-commerce team, you will play a critical role in ensuring the reliability and scalability of our e-commerce platform.Key...
Senior Cloud Engineer

4 weeks ago

San Francisco, California, United States Ansa Full time

About the RoleWe are seeking a highly skilled Senior Cloud Engineer to join our team at Ansa. As a key member of our engineering team, you will be responsible for designing and implementing scalable, reliable, and secure cloud-based systems.Your primary focus will be on building and maintaining our cloud infrastructure, ensuring seamless integration with our...
Site Reliability Engineer

4 weeks ago

San Francisco, California, United States AEG Full time

About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our DevSecOps and Infrastructure team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud...
Senior Cloud Infrastructure Engineer

4 weeks ago

San Francisco, California, United States Humane USA Full time

About the RoleAt Humane USA, we're seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, building, and maintaining our cloud infrastructure to ensure high availability, scalability, and reliability.Key ResponsibilitiesArchitect and implement cloud...
Senior Cloud Reliability Engineer

4 weeks ago

San Francisco, California, United States Crusoe Energy Systems Full time

About This Role:At Crusoe Energy Systems, our Site Reliability Engineering team plays a pivotal role in ensuring the reliability and performance of our infrastructure.SRE at Crusoe is dedicated to detecting, analyzing, and preventing issues to maintain high Service Level Agreement through Service Level Indicators (SLIs) and Service Level Objectives...
Senior Cloud Engineer

4 weeks ago

San Francisco, California, United States Early Warning Services Full time

Job Title: Senior Cloud EngineerWe are seeking a highly skilled Senior Cloud Engineer to join our team at Early Warning Services. As a Senior Cloud Engineer, you will be responsible for designing, implementing, and maintaining large-scale cloud-based systems that meet the needs of our business.Key Responsibilities:Design and implement cloud-based solutions...
Senior Cloud Software Engineer

3 weeks ago

San Francisco, California, United States Amazon Full time

Job Title: Senior Cloud Software EngineerAbout the Role:We are seeking a highly skilled Senior Cloud Software Engineer to join our team at Amazon. As a key member of our team, you will be responsible for designing, developing, and deploying cloud services that leverage AI and machine learning techniques for our Smart Eyewear product.Key Responsibilities:•...
Senior Site Reliability Engineer

4 weeks ago

San Francisco, California, United States Twitter Full time

Job Summary:Twitter is seeking a Senior Site Reliability Engineer to lead a team of engineers working to keep our services reliable and scalable. The ideal candidate will have experience managing services in a distributed environment and be comfortable working with on-prem and cloud-based infrastructure.Responsibilities:Lead a team of site reliability...
Site Reliability Engineer

4 weeks ago

San Francisco, California, United States Circle Full time

About the RoleCircle is a financial technology company at the forefront of the emerging internet of money, where value can flow freely and securely across borders. As a Senior Site Reliability Engineer, you will play a critical role in designing, building, and maintaining Circle's cloud infrastructure to meet the growing needs of our worldwide customer...
Senior Cloud Infrastructure Engineer

4 weeks ago

San Francisco, California, United States Waabi Full time

Senior Cloud Infrastructure EngineerWaabi is seeking a highly skilled Senior Cloud Infrastructure Engineer to join our team. As a key member of our Infrastructure team, you will be responsible for designing, implementing, and troubleshooting cloud systems to support our AI-first approach to self-driving technology.Key Responsibilities:Collaborate with the...
Senior Cloud Infrastructure Engineer

4 weeks ago

San Francisco, California, United States Eateam Full time

Role:As a Senior Cloud Infrastructure Engineer at Eateam, you will be responsible for designing and deploying virtualization architectures, including VMware, Openshift, or KubeVirt platforms. You will also evaluate existing application architectures and identify opportunities for containerization to improve scalability, reliability, and...

Americas

Europe

Asia / Oceania

Africa

Senior Cloud Reliability Engineer