Senior Site Reliability Engineer
2 weeks ago
About
's global computing platform was launched in 2019, enabling businesses to programmatically deploy single-tenant Bare Metal instances in different parts of the world.
We are a team of passionate individuals about hardware, software, and network infrastructure looking to build the fastest, easiest-to-use, developer-centric single-tenant Cloud infrastructure. If you share this passion, join our growing team of talented people and help build the future of the Internet.
Summary
At , the Reliability team is responsible for the health and resilience of the infrastructure that powers our global bare metal cloud. As a Senior Site Reliability Engineer (SRE), you'll focus on building reliable, observable, and self-healing systems at scale.
SREs at work at the intersection of software engineering and infrastructure. You'll design and implement tools that automate operations, improve incident response, and enhance system observability—ensuring our platform is always ready for the workloads of our customers.
This might be a good opportunity if you're passionate about reliability, automation, and creating cloud-like experiences for bare metal infrastructure.
Key Responsabilities
- Continuously improve 's platform reliability and performance
- Design, build, and maintain tools to automate operational tasks and incident response
- Implement and improve observability solutions, including monitoring, alerting, and tracing
- Collaborate with engineering and platform teams to design scalable and resilient systems
- Participate in on-call rotations and lead post-incident reviews with a focus on learning
- Develop and document processes and runbooks that ensure operational excellence
- Contribute to SLOs/SLIs definition and reliability metrics adoption across teams
Skills and Qualifications
- Strong verbal and written English communication skills
- Advanced knowledge of Linux/Unix systems in production environments
- Experience with Kubernetes and container orchestration
- Proficiency with infrastructure automation tools (e.g., Terraform, Ansible)
- Experience with observability stacks (e.g., Prometheus, Grafana, Loki, ELK)
- Familiarity with scripting and programming languages such as Bash, Python, Go, or Ruby
- Working knowledge of Git and CI/CD pipelines
- Solid understanding of incident management and root cause analysis processes
- Knowledge of cloud-native reliability and security best practices
What do we offer?
- Contractor (PJ)
- Paid Time Off
- Competitive Compensation
- Wellhub (former Gympass)
- Annual Bonus based on company and team performance
- Flexible work hours
- Opportunities for professional growth and development
Why
We're a lean, agile team of passionate professionals who believe in the power of innovation and creative problem-solving. As part of our team, you won't be lost in the crowd – you'll be an essential contributor, making a real impact from day one.
Our values at guide us in all our work and partnerships. We're proud to be an inclusive company, and we welcome all applicants for our open positions, regardless of their background, religion, sexual orientation, gender identity, age, nationality, or disability. If these values speak to you, we'd love for you to become a part of our team.
-
Senior Site Reliability Engineer II
4 hours ago
Remote, Oregon, United States Shutterfly Full time $106,000 - $151,000 per yearAt Shutterfly, we make life's experiences unforgettable. We believe there is extraordinary power in the self-expression. That's why our family of brands helps customers create products and capture moments that reflect who they uniquely are.Shutterfly is looking for a Senior Site Reliability Engineer to join our team. Shutterfly is undergoing a comprehensive...
-
Senior Site Reliability Engineer
2 weeks ago
Remote, Oregon, United States Jellyvision Full time $145,000 - $175,000 per yearSenior Site Reliability EngineerWho we areJellyvision ALEX, is on a mission to improve lives by helping people choose and use their benefits. We are raising the bar—for benefits and the employee experience (for our employees and those of the customers we serve) – by scaling personalization, compassion and an earnest intent to be helpful in all that we...
-
Senior Site Reliability Engineer
2 days ago
Remote, Oregon, United States D-Wave Full time $124,545 per yearD-Wave (NYSE: QBTS), D-Wave is a leader in the development and delivery of quantum computing systems, software, and services. We are the world's first commercial supplier of quantum computers, and the only company building both annealing and gate-model quantum computers. Our mission is to help customers realize the value of quantum, today. Our quantum...
-
Site Reliability Engineer
2 weeks ago
Remote, Oregon, United States 2Prod Technologies Corp. Full time $145,000 - $210,000 per yearAbout 2Prod2Prod Technologies Corp. supports the federal government in delivering secure, scalable cloud solutions that advance critical national missions.Position Summary2Prod Technologies Corp. is seeking a Site Reliability Engineer (SRE) with strong GitLab expertise to support and enhance enterprise platforms. This role will focus primarily on GitLab...
-
Staff Software Engineer, Site Reliability
2 weeks ago
Remote, Oregon, United States BABYLIST Full time $199,200 - $239,040 per yearWho We AreBabylist is the leading registry, e-commerce, and content platform for growing families. More than 9 million people shop with Babylist every year, making it the go-to destination for seamless purchasing, trusted guidance, and expert product recommendations for new parents and the people who love them. What began as a universal registry has grown...
-
Principal Site Reliability Engineer
2 weeks ago
Remote, Oregon, United States Blue River Technology Full time $166,000 - $293,000 per yearWe're Blue River, a team of innovators driven to create intelligent machinery that solves monumental problems for our customers. We empower our customers – farmers, construction crews, and foresters - to implement safer and more sustainable solutions, driving increased profitability with less reliance on scarce labor. We believe that focusing on the small...
-
Reliability Engineer
6 hours ago
Remote, Oregon, United States Prolim global corporation Full time $98,000 - $118,304 per yearReliability Engineer (Steel Manufacturing) – Remote / Lewisville, OHLocation: Lewisville, Ohio, USA (Remote option available)Experience: 7–10 yearsAbout the RoleWe are seeking an experienced Reliability Engineer with a strong background in Steel Manufacturing to join our team. The ideal candidate will lead reliability initiatives, perform risk-based...
-
Senior Site Project Manager
2 days ago
Remote, Oregon, United States FORTNA Full time $115,900 - $173,800FORTNA partners with the world's leading brands to transform omnichannel and parcel distribution operations. Known world-wide for enabling companies to keep pace with digital disruption and growth objectives, we design and deliver solutions, powered by intelligent software, to optimize fast, accurate and cost-effective order fulfillment and last mile...
-
Senior Software Engineer
2 weeks ago
Remote, Oregon, United States SentinelOne Full time $150,000 - $250,000 per yearWhat Are We Looking For?SentinelOne is seeking a Senior Software Engineer to join the Observo AI team, our cutting-edge AI-driven data pipeline optimization platform. This role will be responsible for designing, developing, and scaling high-performance systems that process massive volumes of telemetry data while reducing costs and improving insights for...
-
Senior UX Operational Engineer
2 days ago
Remote, Oregon, United States Harmattan AI Full time $170,000 - $200,000 per yearAbout UsAt Harmattan AI, we are a next-generation defense prime building autonomous and scalable defense systems. Driven by rigorous engineering developments of new defense products based on recent robotics and AI developments, we are on a steep growth trajectory. If you are interested in a career in a highly technical environment, thrive on pushing...