Site Reliability Engineer
5 days ago
An inclusive work environment is an empowering one. At Cutover, we lead with empathy and enable others to succeed through curiosity, kindness, and self-expression.
Location: Remote, United States
This role requires on-call shifts, roughly 1 in 4 weeks and 1 in 4 weekends - 2nd Shift: 2:00pm -11:00pm PST (10:00 PM - 7:00 AM UTC)
Cutover provides enterprise technology operations teams with an AI-powered SaaS solution that automates and streamlines complex processes with intelligent runbooks. The Cutover solution enables teams to respond to incidents quickly, recover from IT outages, and manage cloud migrations with precision and efficiency. Cutover is used in many of the world's largest financial institutions to support their critical technology operations, including 5 out of the top 6 largest asset managers and 3 out of the top 5 US banks.
We're looking for a Site Reliability Engineer (SRE) to add to our US team. This role will report to our SRE Lead.
Cutover's SRE team is responsible for ensuring the reliability and performance levels of our production systems and applications. As a team, we're committed to constantly improving our engineering culture to maintain a balance between risk and reliability.
What tech stack do we use here at Cutover?
The platform is built on a ReactJS frontend with a Ruby on Rails API, and all hosted on the reliable infrastructure of Amazon Web Services (AWS).
Your role will involve close collaboration with our support and engineering teams. Together, we actively engage in maintaining and optimizing the platform's reliability, utilizing cutting-edge tools and occasionally leveraging in-house software and scripts.
If you're passionate about ensuring the dependability and efficiency of complex systems and thrive in an environment where technologies like React, Ruby, AWS, Kubernetes, Terraform, Git, and Ansible are at the forefront, we invite you to join our team. Together, let's elevate the reliability of our Cutover Enterprise platform to new heights.
As a Site Reliability Engineer, here's what you'll be up to:
- Incident Response: Respond to incidents and alerts, triaging urgency and investigating root cause
- Documentation: Regular contributions to improve our documentation on system design, troubleshooting, best practices, and engineering processes
- Root Cause Analysis: Contribute to post-mortems and help identify long-term improvements under guidance
- Collaboration: Support cross-functional teams during investigations and post-incident reviews
- Observability: Support and enhance observability tools and techniques by identifying metrics, logging, and alerting improvements
- Automation: Write and execute simple automation scripts (e.g. Python, Ruby, Bash) to improve reliability and toil reduction
- Development: Work on internal tools, pipelines, and IaC solutions to help improve the speed of software delivery and recovery
- System Reliability: Work on efforts to enhance the reliability and performance of our application and systems, ensuring optimal uptime and minimal disruptions.
- Infrastructure Optimization: Work closely with the development and platform engineering teams to optimize the infrastructure on AWS, ensuring scalability and efficiency.
Please note that this role involves a rotating on-call schedule, which will require occasional evening and weekend availability.
What we'd like you to bring to the table:…
- A genuine excitement for complex problem solving within our tech stack, applying what you know to our unique problems.
- Familiarity with at least one scripting language such as Ruby, JavaScript, Python, Bash
- Experience with containerization (i.e. Docker) or IaC (e.g. Terraform, Helm, CloudFormation)
- An eagerness to follow modern engineering practices and learn from others
- Familiarity with observability tools such as DataDog, New Relic, Grafana, Prometheus, ELK, or OpenTelemetry
- Understanding of core networking concepts (DNS, HTTP/S, Load Balancing, etc.)
- A collaborative mindset with clear communication skills
- Willing to ask questions to gain a better understanding of new or complex concepts
Nice to haves…
- Exposure to major incident response processes
- AWS Certified Cloud Practitioner or hands-on experience with cloud environments
The good stuff…
- We're excited to offer Share Options as part of our compensation package.
- 20 days of PTO per year + public holidays, and we want you to take all of them
- 3 volunteer days to use for any charitable/voluntary cause you would like.
- A top-tier private health insurance package.
- 401k contribution plan
- Work from home stipend
- A personal learning and development budget through Learnerbly. You'll be supported in your quest for knowledge, whatever that looks like to you.
- If you're thinking of starting or growing your family, then you'll be in great company - more than half of our team are parents and we've built a globally consistent parental leave approach that we're proud of.
- Employee Referral Scheme.
- Safeguarding the mental health of our teams is paramount for us. If you'd like to, then you'll be able to avail yourself of multiple Cutover mental health initiatives, from fully subsidised therapy sessions to subscriptions to leading wellbeing platforms.
Target compensation package: $120,000 - $130,000 base, + stock options + benefits.
The final offer may vary from the target compensation package, taking into consideration factors such as your experience level and skill set. If we aren't aligned on salary at this stage, we'd still love to hear from you to better understand if there are more suitable opportunities at Cutover.
Diversity Statement - Empowering Our Teams
We encourage our team to bring their authentic selves to work, which we have found has strengthened workplace relationships and fostered a genuine sense of community.
If you are excited by this role, we invite you to apply Even if your profile doesn't check all the boxes, please don't simply scroll past We recognize that talent lies everywhere and that some demographic groups are more likely to apply for a "stretch role" than others. We are always open to different perspectives and professional backgrounds to keep Cutover's culture evolving and to ensure that we never stop learning.
Cutover is an Equal Opportunity Employer. Maintaining an equitable hiring process is imperative to our mission. All applicants are considered without regard to race, ethnicity, national origin, religion, sex, gender identity, sexual orientation, age, mental or physical disability, marital status, protected veteran or parental status.
Learn more about Life at Cutover, our Guiding Principles, and our latest news on LinkedIn.
-
Site Reliability Engineer
2 weeks ago
Remote, Oregon, United States ADT Full time $200,000 - $250,000 per yearADT is transitioning to an in-office model. New team members will work from home but should plan to return to an in-office model at a later date. We will keep you well informed and supported throughout the transition.Summary:We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for...
-
Site Reliability Engineer
2 weeks ago
Remote, Oregon, United States JWay Group Full timeSr. Site Reliability Engineer, Stack ManagementAs a Site Reliability Engineer, you will be responsible for architecting, maintaining, and managing our client's infrastructure which includes solving some of the most challenging cloud access and data security problems for enterprise customers.Job ResponsibilitiesMaintain and support existing IT infrastructure...
-
Staff Site Reliability Engineer
2 weeks ago
Remote, Oregon, United States AlphaSense Full timeAbout AlphaSense: The world's most sophisticated companies rely on AlphaSense to remove uncertainty from decision-making. With market intelligence and search built on proven AI, AlphaSense delivers insights that matter from content you can trust. Our universe of public and private content includes equity research, company filings, event transcripts, expert...
-
Site Reliability Engineer
6 days ago
Remote, Oregon, United States 2Prod Technologies Corp. Full timeAbout 2Prod2Prod Technologies Corp. supports the federal government in delivering secure, scalable cloud solutions that advance critical national missions.Position Summary2Prod Technologies Corp. is seeking a Site Reliability Engineer (SRE) with strong GitLab expertise to support and enhance enterprise platforms. This role will focus primarily on GitLab...
-
Lead Site Reliability Engineer
4 days ago
Remote, Oregon, United States Canary Technologies Corp Full timeAbout Us Canary Technologies is changing the game for hotels with modern software powered by Canary's hospitality-specific AI platform. Canary is utilized by 20,000+ hoteliers in 100+ countries to equip hoteliers with the technology they need to work smarter and wow their guests. Major hotel brands such as Wyndham, Marriott, IHG, Four Seasons, Rosewood, and...
-
Senior Site Reliability Engineer
7 days ago
Remote, Oregon, United States Fortress Information Security Full timeSenior Site Reliability EngineerLocation: RemoteCompensation: $160, ,000 per year, depending on experience and qualifications.Employment Type: Full-TimeWhat you can expect as the Senior Site Reliability Engineer at Fortress…The Senior Site Reliability Engineer is responsible for ensuring the reliability, performance, and scalability of critical systems and...
-
Senior Site Reliability Engineer
6 days ago
Remote, Oregon, United States Maxihost Full timeAbout 's global computing platform was launched in 2019, enabling businesses to programmatically deploy single-tenant Bare Metal instances in different parts of the world. We are a team of passionate individuals about hardware, software, and network infrastructure looking to build the fastest, easiest-to-use, developer-centric single-tenant Cloud...
-
Senior Site Reliability Engineer
5 days ago
Remote, Oregon, United States Granicus Full timeThe CompanyServing the People Who Serve the PeopleGranicus is driven by the excitement of building, implementing, and maintaining technology that is transforming the Govtech industry by bringing governments and its constituents together. We are on a mission to support our customers with meeting the needs of their communities and implementing our technology...
-
Global Head of Site Reliability Engineering
4 days ago
Remote, Oregon, United States Socure Full timeWhy Socure?At Socure, we're on a mission—to verify 100% of good identities in real time and eliminate identity fraud from the internet.Using predictive analytics and advanced machine learning trained on billions of signals to power RiskOS, Socure has created the most accurate identity verification and fraud prevention platform in the world. Trusted by...
-
Senior Cloud Site Reliability Engineer
4 days ago
Remote, Oregon, United States Installation Made Easy, Inc Full timePosition Title: Senior Cloud Site Reliability Engineer (Azure)Department: Information TechnologyLocation: RemoteReports To: Platform DevOps Team LeadInstallation Made Easy ("IME") provides software and process management that enable retailers and contractors to offer installed home improvements to homeowners in a convenient, consistent, and affordable...