Senior Cloud Reliability Engineer
3 weeks ago
About Crusoe Energy Systems:
We are a company on a mission to unlock value in stranded energy resources through the power of computation.
Our goal is to align the long-term interests of the climate with the future of global computing infrastructure.
Data centers consume an exponentially growing power footprint to deliver technology to all connected devices, and we aim to ensure that the energy meeting that demand is sourced in an environmentally responsible fashion.
We co-locate mobile data centers with stranded energy resources, like flare gas and underloaded renewables, to deliver low-cost, carbon-negative distributed computing solutions.
Our managed cloud services platform, Crusoe Cloud, is powered by stranded energy and enables climate-friendly innovation in computationally intensive fields, including artificial intelligence, graphics rendering, and computational biology.
About This Role:
Our Site Reliability Engineering (SRE) team plays a pivotal role in ensuring the reliability and performance of our infrastructure.
SRE at Crusoe is dedicated to detecting, analyzing, and preventing issues to maintain high Service Level Agreement through Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
Through automation and proactive remediation, our SREs not only resolve common errors automatically but also advise various engineering teams in building resilient code.
We prioritize anticipating and resolving issues before they impact our customers, conducting thorough post-mortems, and driving continuous improvement.
Our customer-centric approach ensures that clients always have access to the virtual machines they depend on.
A Day in the Life:
As a Site Reliability Engineer at Crusoe Energy Systems, your day begins with a review of overnight alerts and system performance metrics to ensure everything is running smoothly.
You will collaborate with your team in a morning stand-up meeting to discuss ongoing projects, recent incidents, and priorities for the day.
Your tasks might include automating routine processes, analyzing system logs, and developing tools to enhance our monitoring capabilities.
You'll spend part of your day working closely with software engineers, advising on best practices for resilient code and reviewing changes before deployment.
Regularly, you will engage in incident response drills, post-mortems, and root cause analysis sessions to learn from past issues and prevent future ones.
Throughout the day, you will stay focused on maintaining high SLIs and SLOs, ensuring that our infrastructure remains robust and reliable for our customers.
By day's end, you will document your work, share insights with your team, and plan for the next day's challenges, always with a customer-centric mindset.
Requirements:
- 5+ years of professional SRE experience
- 5+ years of experience contributing to architecture and design (architecture, design patterns, reliability and scaling) of new and current systems
- Bachelor's Degree in Computer Science or related field, or 8+ years relevant work experience
- Solid understanding of infrastructure design, including the operational trade-offs of various designs
- Experience writing high-quality code with at least one programming language (Python, Go, or similar)
- Experience building with modern infrastructure tools such as Docker, Kubernetes, Ansible, Cloud Formation, Terraform
- Experience building with modern CI/CD practices and build systems, such as GitLab CI/CD, CircleCI, GitHub Actions
- Experience with logging, monitoring, and alerting systems and tools
- Experience with Unix/Linux environments
- Experience with TCP/IP and network programming
- Experience with information security best practices
- Excellent communication skills
- Must be able to pass a background check
- Embody the Company values
Benefits:
- Hybrid work schedule
- Industry-competitive pay
- Restricted Stock Units in a fast-growing, well-funded technology company
- Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
- Employer contributions to HSA accounts
- Paid Parental Leave
- Paid life insurance, short-term, and long-term disability
- Teladoc
- 401(k) with a 100% match up to 4% of salary
- Generous paid time off and holiday schedule
- Cell phone reimbursement
- Tuition reimbursement
- Subscription to the Calm app
- MetLife Legal
- Company-paid commuter benefit; $50 per pay period
Compensation Range:
Compensation will be paid in the range of $183,000 - $250,000. Restricted Stock Units are included in all offers.
Compensation to be determined by the applicant's education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.
Crusoe Energy is an Equal Opportunity Employer.
Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.
-
Senior Cloud Reliability Engineer
3 weeks ago
San Francisco, California, United States Cribl, Inc Full timeCribl Inc is seeking a Senior Cloud Reliability Engineer to join our mission to unlock the value of all observability data.Cribl provides users a new level of observability, intelligence and control over their real-time data.You will join a team of technical engineers who are committed to shipping only high-quality software and enjoying all the goat gifs the...
-
Senior Cloud Reliability Engineer
4 weeks ago
San Francisco, California, United States Atlassian Full timeOverview:We are seeking a highly skilled Senior Cloud Reliability Engineer to join our growing SRE team at Atlassian. As a key member of our team, you will be responsible for designing, implementing, and maintaining scalable and reliable cloud infrastructure that supports our suite of cloud products.The ideal candidate will have a strong background in cloud...
-
Senior Cloud Reliability Engineer
4 weeks ago
San Francisco, California, United States Varo Bank Full timeVaro Bank's cloud infrastructure is a complex system that requires a high level of reliability and availability. As a Senior Cloud Reliability Engineer, you will be responsible for designing and maintaining disaster recovery scenarios, ensuring that our systems are always up and running.We are looking for a skilled engineer who can write and maintain...
-
Senior Cloud Reliability Engineer
4 weeks ago
San Francisco, California, United States Crusoe Energy Inc Full timeAbout Crusoe Energy IncCrusoe Energy Inc is a pioneering company that is revolutionizing the way we approach energy resources. Our mission is to unlock value in stranded energy resources through the power of computation.Job SummaryWe are seeking a highly skilled Senior/Staff Site Reliability Engineer to join our team. As a key member of our engineering team,...
-
Senior Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Tampa Gardens Senior Living Full timeAbout the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our Cloud Infrastructure Team. As a key member of our team, you will be responsible for deploying, managing, optimizing, and upgrading the systems that run Sight Machine software.You will work closely with our Development Engineering team to ensure the stability,...
-
Senior Cloud Reliability Engineer
4 weeks ago
San Francisco, California, United States Crusoe Full timeAbout Crusoe EnergyCrusoe Energy is a pioneering company that aims to unlock value in stranded energy resources through the power of computation. Our mission is to align the long-term interests of the climate with the future of global computing infrastructure.Job DescriptionWe are seeking a highly skilled Senior/Staff Site Reliability Engineer to join our...
-
Senior Cloud Engineer
4 weeks ago
San Francisco, California, United States TBWA\Chiat\Day Full timeAbout Scout MotorsScout Motors is a pioneering company that is revolutionizing the electric pick-up truck and rugged SUV marketplace. We're a team of innovators, entrepreneurs, and visionaries who are passionate about shaping the future of transportation.Job SummaryWe're seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key...
-
Senior Cloud Engineer
3 weeks ago
San Francisco, California, United States Eateam Full timeRole:As a key member of Eateam's infrastructure team, we are seeking a highly skilled Senior Cloud Engineer to lead our cloud platform engineering efforts.Responsibilities:Design and deploy virtualization architectures, including VMware, Openshift, or KubeVirt platforms.Evaluate existing application architectures and identify opportunities for...
-
Senior Cloud Reliability Engineer
4 weeks ago
San Francisco, California, United States Crusoe Full timeAbout Crusoe Energy SystemsCrusoe Energy Systems is a pioneering company that's revolutionizing the way we approach energy resources. Our mission is to unlock value in stranded energy resources through the power of computation.We're driven by a vision to align the long-term interests of the climate with the future of global computing infrastructure. As data...
-
Senior Cloud Reliability Engineer
4 weeks ago
San Jose, California, United States Tik Tok Full timeSenior Site Reliability Engineer, Global E-commerceTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy to our users. As a Senior Site Reliability Engineer on our Global E-commerce team, you will play a critical role in ensuring the reliability and scalability of our e-commerce platform.Key...
-
Senior Cloud Engineer
4 weeks ago
San Francisco, California, United States Ansa Full timeAbout the RoleWe are seeking a highly skilled Senior Cloud Engineer to join our team at Ansa. As a key member of our engineering team, you will be responsible for designing and implementing scalable, reliable, and secure cloud-based systems.Your primary focus will be on building and maintaining our cloud infrastructure, ensuring seamless integration with our...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States AEG Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our DevSecOps and Infrastructure team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud...
-
Senior Cloud Infrastructure Engineer
4 weeks ago
San Francisco, California, United States Humane USA Full timeAbout the RoleAt Humane USA, we're seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, building, and maintaining our cloud infrastructure to ensure high availability, scalability, and reliability.Key ResponsibilitiesArchitect and implement cloud...
-
Senior Cloud Reliability Engineer
4 weeks ago
San Francisco, California, United States Crusoe Energy Systems Full timeAbout This Role:At Crusoe Energy Systems, our Site Reliability Engineering team plays a pivotal role in ensuring the reliability and performance of our infrastructure.SRE at Crusoe is dedicated to detecting, analyzing, and preventing issues to maintain high Service Level Agreement through Service Level Indicators (SLIs) and Service Level Objectives...
-
Senior Cloud Engineer
4 weeks ago
San Francisco, California, United States Early Warning Services Full timeJob Title: Senior Cloud EngineerWe are seeking a highly skilled Senior Cloud Engineer to join our team at Early Warning Services. As a Senior Cloud Engineer, you will be responsible for designing, implementing, and maintaining large-scale cloud-based systems that meet the needs of our business.Key Responsibilities:Design and implement cloud-based solutions...
-
Senior Cloud Software Engineer
3 weeks ago
San Francisco, California, United States Amazon Full timeJob Title: Senior Cloud Software EngineerAbout the Role:We are seeking a highly skilled Senior Cloud Software Engineer to join our team at Amazon. As a key member of our team, you will be responsible for designing, developing, and deploying cloud services that leverage AI and machine learning techniques for our Smart Eyewear product.Key Responsibilities:•...
-
Senior Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Twitter Full timeJob Summary:Twitter is seeking a Senior Site Reliability Engineer to lead a team of engineers working to keep our services reliable and scalable. The ideal candidate will have experience managing services in a distributed environment and be comfortable working with on-prem and cloud-based infrastructure.Responsibilities:Lead a team of site reliability...
-
Site Reliability Engineer
4 weeks ago
San Francisco, California, United States Circle Full timeAbout the RoleCircle is a financial technology company at the forefront of the emerging internet of money, where value can flow freely and securely across borders. As a Senior Site Reliability Engineer, you will play a critical role in designing, building, and maintaining Circle's cloud infrastructure to meet the growing needs of our worldwide customer...
-
Senior Cloud Infrastructure Engineer
4 weeks ago
San Francisco, California, United States Waabi Full timeSenior Cloud Infrastructure EngineerWaabi is seeking a highly skilled Senior Cloud Infrastructure Engineer to join our team. As a key member of our Infrastructure team, you will be responsible for designing, implementing, and troubleshooting cloud systems to support our AI-first approach to self-driving technology.Key Responsibilities:Collaborate with the...
-
Senior Cloud Infrastructure Engineer
4 weeks ago
San Francisco, California, United States Eateam Full timeRole:As a Senior Cloud Infrastructure Engineer at Eateam, you will be responsible for designing and deploying virtualization architectures, including VMware, Openshift, or KubeVirt platforms. You will also evaluate existing application architectures and identify opportunities for containerization to improve scalability, reliability, and...