Site Reliability Engineer

5 days ago

Remote, Oregon, United States Cutover Full time

An inclusive work environment is an empowering one. At Cutover, we lead with empathy and enable others to succeed through curiosity, kindness, and self-expression.

Location: Remote, United States

This role requires on-call shifts, roughly 1 in 4 weeks and 1 in 4 weekends - 2nd Shift: 2:00pm -11:00pm PST (10:00 PM - 7:00 AM UTC)

Cutover provides enterprise technology operations teams with an AI-powered SaaS solution that automates and streamlines complex processes with intelligent runbooks. The Cutover solution enables teams to respond to incidents quickly, recover from IT outages, and manage cloud migrations with precision and efficiency. Cutover is used in many of the world's largest financial institutions to support their critical technology operations, including 5 out of the top 6 largest asset managers and 3 out of the top 5 US banks.

We're looking for a Site Reliability Engineer (SRE) to add to our US team. This role will report to our SRE Lead.

Cutover's SRE team is responsible for ensuring the reliability and performance levels of our production systems and applications. As a team, we're committed to constantly improving our engineering culture to maintain a balance between risk and reliability.

What tech stack do we use here at Cutover?

The platform is built on a ReactJS frontend with a Ruby on Rails API, and all hosted on the reliable infrastructure of Amazon Web Services (AWS).

Your role will involve close collaboration with our support and engineering teams. Together, we actively engage in maintaining and optimizing the platform's reliability, utilizing cutting-edge tools and occasionally leveraging in-house software and scripts.

If you're passionate about ensuring the dependability and efficiency of complex systems and thrive in an environment where technologies like React, Ruby, AWS, Kubernetes, Terraform, Git, and Ansible are at the forefront, we invite you to join our team. Together, let's elevate the reliability of our Cutover Enterprise platform to new heights.

As a Site Reliability Engineer, here's what you'll be up to:

Incident Response: Respond to incidents and alerts, triaging urgency and investigating root cause
Documentation: Regular contributions to improve our documentation on system design, troubleshooting, best practices, and engineering processes
Root Cause Analysis: Contribute to post-mortems and help identify long-term improvements under guidance
Collaboration: Support cross-functional teams during investigations and post-incident reviews
Observability: Support and enhance observability tools and techniques by identifying metrics, logging, and alerting improvements
Automation: Write and execute simple automation scripts (e.g. Python, Ruby, Bash) to improve reliability and toil reduction
Development: Work on internal tools, pipelines, and IaC solutions to help improve the speed of software delivery and recovery
System Reliability: Work on efforts to enhance the reliability and performance of our application and systems, ensuring optimal uptime and minimal disruptions.
Infrastructure Optimization: Work closely with the development and platform engineering teams to optimize the infrastructure on AWS, ensuring scalability and efficiency.

Please note that this role involves a rotating on-call schedule, which will require occasional evening and weekend availability.

What we'd like you to bring to the table:…

A genuine excitement for complex problem solving within our tech stack, applying what you know to our unique problems.
Familiarity with at least one scripting language such as Ruby, JavaScript, Python, Bash
Experience with containerization (i.e. Docker) or IaC (e.g. Terraform, Helm, CloudFormation)
An eagerness to follow modern engineering practices and learn from others
Familiarity with observability tools such as DataDog, New Relic, Grafana, Prometheus, ELK, or OpenTelemetry
Understanding of core networking concepts (DNS, HTTP/S, Load Balancing, etc.)
A collaborative mindset with clear communication skills
Willing to ask questions to gain a better understanding of new or complex concepts

Nice to haves…

Exposure to major incident response processes
AWS Certified Cloud Practitioner or hands-on experience with cloud environments

The good stuff…

We're excited to offer Share Options as part of our compensation package.
20 days of PTO per year + public holidays, and we want you to take all of them
3 volunteer days to use for any charitable/voluntary cause you would like.
A top-tier private health insurance package.
401k contribution plan
Work from home stipend
A personal learning and development budget through Learnerbly. You'll be supported in your quest for knowledge, whatever that looks like to you.
If you're thinking of starting or growing your family, then you'll be in great company - more than half of our team are parents and we've built a globally consistent parental leave approach that we're proud of.
Employee Referral Scheme.
Safeguarding the mental health of our teams is paramount for us. If you'd like to, then you'll be able to avail yourself of multiple Cutover mental health initiatives, from fully subsidised therapy sessions to subscriptions to leading wellbeing platforms.

Target compensation package: $120,000 - $130,000 base, + stock options + benefits.

The final offer may vary from the target compensation package, taking into consideration factors such as your experience level and skill set. If we aren't aligned on salary at this stage, we'd still love to hear from you to better understand if there are more suitable opportunities at Cutover.

Diversity Statement - Empowering Our Teams

We encourage our team to bring their authentic selves to work, which we have found has strengthened workplace relationships and fostered a genuine sense of community.

If you are excited by this role, we invite you to apply Even if your profile doesn't check all the boxes, please don't simply scroll past We recognize that talent lies everywhere and that some demographic groups are more likely to apply for a "stretch role" than others. We are always open to different perspectives and professional backgrounds to keep Cutover's culture evolving and to ensure that we never stop learning.

Cutover is an Equal Opportunity Employer. Maintaining an equitable hiring process is imperative to our mission. All applicants are considered without regard to race, ethnicity, national origin, religion, sex, gender identity, sexual orientation, age, mental or physical disability, marital status, protected veteran or parental status.

Learn more about Life at Cutover, our Guiding Principles, and our latest news on LinkedIn.

Site Reliability Engineer

2 weeks ago

Remote, Oregon, United States ADT Full time $200,000 - $250,000 per year

ADT is transitioning to an in-office model. New team members will work from home but should plan to return to an in-office model at a later date. We will keep you well informed and supported throughout the transition.Summary:We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for...
Site Reliability Engineer

2 weeks ago

Remote, Oregon, United States JWay Group Full time

Sr. Site Reliability Engineer, Stack ManagementAs a Site Reliability Engineer, you will be responsible for architecting, maintaining, and managing our client's infrastructure which includes solving some of the most challenging cloud access and data security problems for enterprise customers.Job ResponsibilitiesMaintain and support existing IT infrastructure...
Staff Site Reliability Engineer

2 weeks ago

Remote, Oregon, United States AlphaSense Full time

About AlphaSense: The world's most sophisticated companies rely on AlphaSense to remove uncertainty from decision-making. With market intelligence and search built on proven AI, AlphaSense delivers insights that matter from content you can trust. Our universe of public and private content includes equity research, company filings, event transcripts, expert...
Site Reliability Engineer

6 days ago

Remote, Oregon, United States 2Prod Technologies Corp. Full time

About 2Prod2Prod Technologies Corp. supports the federal government in delivering secure, scalable cloud solutions that advance critical national missions.Position Summary2Prod Technologies Corp. is seeking a Site Reliability Engineer (SRE) with strong GitLab expertise to support and enhance enterprise platforms. This role will focus primarily on GitLab...
Lead Site Reliability Engineer

4 days ago

Remote, Oregon, United States Canary Technologies Corp Full time

About Us Canary Technologies is changing the game for hotels with modern software powered by Canary's hospitality-specific AI platform. Canary is utilized by 20,000+ hoteliers in 100+ countries to equip hoteliers with the technology they need to work smarter and wow their guests. Major hotel brands such as Wyndham, Marriott, IHG, Four Seasons, Rosewood, and...
Senior Site Reliability Engineer

7 days ago

Remote, Oregon, United States Fortress Information Security Full time

Senior Site Reliability EngineerLocation: RemoteCompensation: $160, ,000 per year, depending on experience and qualifications.Employment Type: Full-TimeWhat you can expect as the Senior Site Reliability Engineer at Fortress…The Senior Site Reliability Engineer is responsible for ensuring the reliability, performance, and scalability of critical systems and...
Senior Site Reliability Engineer

6 days ago

Remote, Oregon, United States Maxihost Full time

About 's global computing platform was launched in 2019, enabling businesses to programmatically deploy single-tenant Bare Metal instances in different parts of the world. We are a team of passionate individuals about hardware, software, and network infrastructure looking to build the fastest, easiest-to-use, developer-centric single-tenant Cloud...
Senior Site Reliability Engineer

5 days ago

Remote, Oregon, United States Granicus Full time

The CompanyServing the People Who Serve the PeopleGranicus is driven by the excitement of building, implementing, and maintaining technology that is transforming the Govtech industry by bringing governments and its constituents together. We are on a mission to support our customers with meeting the needs of their communities and implementing our technology...
Global Head of Site Reliability Engineering

4 days ago

Remote, Oregon, United States Socure Full time

Why Socure?At Socure, we're on a mission—to verify 100% of good identities in real time and eliminate identity fraud from the internet.Using predictive analytics and advanced machine learning trained on billions of signals to power RiskOS, Socure has created the most accurate identity verification and fraud prevention platform in the world. Trusted by...
Senior Cloud Site Reliability Engineer

4 days ago

Remote, Oregon, United States Installation Made Easy, Inc Full time

Position Title: Senior Cloud Site Reliability Engineer (Azure)Department: Information TechnologyLocation: RemoteReports To: Platform DevOps Team LeadInstallation Made Easy ("IME") provides software and process management that enable retailers and contractors to offer installed home improvements to homeowners in a convenient, consistent, and affordable...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer