Site Reliability Engineer
5 days ago
An inclusive work environment is an empowering one. At Cutover, we lead with empathy and enable others to succeed through curiosity, kindness, and self-expression.
Location: Remote, United States (candidates should be based in ET or -1 ET)
2nd Shift: 2:00pm -11:00pm PST (10:00 PM - 7:00 AM UTC)
Cutover provides enterprise technology operations teams with an AI-powered SaaS solution that automates and streamlines complex processes with intelligent runbooks. The Cutover solution enables teams to respond to incidents quickly, recover from IT outages, and manage cloud migrations with precision and efficiency. Cutover is used in many of the world's largest financial institutions to support their critical technology operations, including 5 out of the top 6 largest asset managers and 3 out of the top 5 US banks.
We're looking for a Site Reliability Engineer (SRE) to add to our US team. This role will report to our SRE Lead.
Cutover's SRE team is responsible for ensuring the reliability and performance levels of our production systems and applications. As a team, we're committed to constantly improving our engineering culture to maintain a balance between risk and reliability.
What tech stack do we use here at Cutover?
The platform is built on a ReactJS frontend with a Ruby on Rails API, and all hosted on the reliable infrastructure of Amazon Web Services (AWS).
Your role will involve close collaboration with our support and engineering teams. Together, we actively engage in maintaining and optimizing the platform's reliability, utilizing cutting-edge tools and occasionally leveraging in-house software and scripts.
If you're passionate about ensuring the dependability and efficiency of complex systems and thrive in an environment where technologies like React, Ruby, AWS, Kubernetes, Terraform, Git, and Ansible are at the forefront, we invite you to join our team. Together, let's elevate the reliability of our Cutover Enterprise platform to new heights.
As a Site Reliability Engineer, here's what you'll be up to:
- Incident Response: Respond to incidents and alerts, triaging urgency and investigating root cause
- Documentation: Regular contributions to improve our documentation on system design, troubleshooting, best practices, and engineering processes
- Root Cause Analysis: Contribute to post-mortems and help identify long-term improvements under guidance
- Collaboration: Support cross-functional teams during investigations and post-incident reviews
- Observability: Support and enhance observability tools and techniques by identifying metrics, logging, and alerting improvements
- Automation: Write and execute simple automation scripts (e.g. Python, Ruby, Bash) to improve reliability and toil reduction
- Development: Work on internal tools, pipelines, and IaC solutions to help improve the speed of software delivery and recovery
- System Reliability: Work on efforts to enhance the reliability and performance of our application and systems, ensuring optimal uptime and minimal disruptions.
- Infrastructure Optimization: Work closely with the development and platform engineering teams to optimize the infrastructure on AWS, ensuring scalability and efficiency.
Please note that this role involves a rotating on-call schedule, which will require occasional evening and weekend availability.
What we'd like you to bring to the table
:
…
- A genuine excitement for complex problem solving within our tech stack, applying what you know to our unique problems.
- Familiarity with at least one scripting language such as Ruby, JavaScript, Python, Bash
- Experience with containerization (i.e. Docker) or IaC (e.g. Terraform, Helm, CloudFormation)
- An eagerness to follow modern engineering practices and learn from others
- Familiarity with observability tools such as DataDog, New Relic, Grafana, Prometheus, ELK, or OpenTelemetry
- Understanding of core networking concepts (DNS, HTTP/S, Load Balancing, etc.)
- A collaborative mindset with clear communication skills
- Willing to ask questions to gain a better understanding of new or complex concepts
Nice to haves…
- Exposure to major incident response processes
- AWS Certified Cloud Practitioner or hands-on experience with cloud environments
The good stuff…
- We're excited to offer Share Options as part of our compensation package.
- 20 days of PTO per year + public holidays
, and we want you to take all of them - 3 volunteer days
to use for any charitable/voluntary cause you would like. - A top-tier private health insurance package.
- 401k contribution plan
- Work from home stipend
- A personal learning and development budget through Learnerbly. You'll be supported in your quest for knowledge, whatever that looks like to you.
- If you're thinking of starting or growing your family, then you'll be in great company - more than half of our team are parents and we've built a globally consistent parental leave approach that we're proud of.
- Employee Referral Scheme.
- Safeguarding the mental health of our teams is paramount for us. If you'd like to, then you'll be able to avail yourself of multiple Cutover mental health initiatives, from fully subsidised therapy sessions to subscriptions to leading wellbeing platforms.
Target compensation package:
$120,000 - $130,000 base, + stock options + benefits.
The final offer may vary from the target compensation package, taking into consideration factors such as your experience level and skill set. If we aren't aligned on salary at this stage, we'd still love to hear from you to better understand if there are more suitable opportunities at Cutover.
Diversity Statement - Empowering Our Teams
We encourage our team to bring their authentic selves to work, which we have found has strengthened workplace relationships and fostered a genuine sense of community.
If you are excited by this role, we invite you to apply
Even if your profile doesn't check all the boxes, please don't simply scroll past We recognize that talent lies everywhere and that some demographic groups are more likely to apply for a "stretch role" than others. We are always open to different perspectives and professional backgrounds to keep Cutover's culture evolving and to ensure that we never stop learning.
Cutover is an Equal Opportunity Employer. Maintaining an equitable hiring process is imperative to our mission. All applicants are considered without regard to race, ethnicity, national origin, religion, sex, gender identity, sexual orientation, age, mental or physical disability, marital status, protected veteran or parental status.
Learn more about Life at Cutover, our Guiding Principles, and our latest news on LinkedIn.
-
Site Reliability Engineer
4 days ago
New York, New York, United States CloudIngest Full time $120,000 - $180,000 per yearSite Reliability Engineer (SRE)focused on Dynatrace, OpenTelemetry, and Data Observability using tools like Splunk, Datadog, and New Relic..Location: Berkeley Heights, NJ |Onsite Work Setting(5 days/week in the office required).Role Overview: We're seeking a skilled Site Reliability Engineer with deep expertise in OpenTelemetry and data observability...
-
Site Reliability Engineer
5 days ago
New York, New York, United States Ampstek Full timeTitle: SRELocation: New York, NY (Day 1 Onsite)Implementation: InfosysKindly share Must Have SkillsSRE experience Cloud knowledgeKubernetesApplication log Monitoring, Infrastructure log MonitoringDetailed Job DescriptionSite Reliability Engineer SRE1 SRE experience, Cloud knowledge, Application log Monitoring, Infrastructure log Monitoring, Kubernetes,...
-
Site Reliability Engineer
23 hours ago
New York, New York, United States Kanak Elite Services Full time $140,000 - $170,000 per yearTitle: Site Reliability Engineer (SRE) (Automation & Scheduling)Location: Fully Remote (CST hours) - open to tier 2/3 markets (e.g., Omaha, Kansas, etc.)Duration: 6 Months Contract to HireInterview Process3 Rounds TotalHiring ManagerDirector of Back Office SystemsTeam MemberSeeking aSite Reliability Engineer Automation & Schedulingto lead efforts in...
-
Staff Site Reliability Engineer
5 days ago
New York, New York, United States Tabs Full time $200,000 - $240,000 per yearAbout The CompanyTabs is the leading AI-native revenue platform for modern finance and accounting teams. Tabs agents automates the entire contract-to-cash lifecycle, including billing, collections, revenue recognition, and reporting, to help teams eliminate manual work and accelerate cash flow.High-growth companies like Cursor and Statsig rely on Tabs to...
-
Senior Site Reliability Engineer
4 days ago
New York, New York, United States Uniswap Labs Full time $198,000 - $220,000 per yearUniswap Labs builds products that help millions of people access DeFi simply and securely ‒ from the Uniswap Web App and Wallet to crypto infrastructure like the Uniswap Trading API, and Unichain. Uniswap Labs also contributes to the development of the Uniswap Protocol, which has processed over $2.9 trillion in volume across thousands of tokens on Ethereum...
-
Site Reliability Engineer
5 days ago
New York, New York, United States WalkMe Full time $100,000 - $140,000 per yearWalkMe, an SAP company, pioneered the Digital Adoption Platform (DAP) to enable business leaders to fully harness technology in today's complex digital landscape. By leveraging WalkMe's features—guidance, engagement, insights, and automation—employees boost efficiency, executives gain greater visibility into digital usage, and organizations maximize...
-
Manager, Site Reliability Engineering
4 days ago
New York, New York, United States YES Network Full time $120,000 - $150,000 per yearManager, Site Reliability EngineeringYES Network for the Gotham Advanced Media and Entertainment ("G.A.M.E")Gotham Advanced Media and Entertainment ("G.A.M.E."), a joint venture of Yankees Entertainment and Sports Network ("YES") and MSG Networks ("MSGN"), is actively seeking a Manager, Site Reliability Engineering to join their team in the greater NYC...
-
Senior Site Reliability Engineer
2 days ago
New York, New York, United States StubHub Full time $200,000 - $250,000 per yearStubHub is on a mission to redefine the live event experience on a global scale. Whether someone is looking to attend their first event or their hundredth, we're here to delight them all the way from the moment they start looking for a ticket until they step through the gate. The same goes for our sellers. From fans selling a single ticket to the promoters...
-
Site Reliability Engineer
12 hours ago
New York, New York, United States Dev Full time $100,000 - $120,000 per yearCompany Description Booking Job Description At , our mission is to make it easier for everyone to experience the world. And while that world might feel a little farther away right now, we're busy preparing for when the world is ready to travel once more. With strategic long-term investments into what we believe the future of travel can be, we are opening...
-
Staff Site Reliability Engineer
5 days ago
New York, New York, United States Altana Full time $170,000 - $220,000AI can be a powerful tool for good in the world – at Altana we apply AI to the world's largest organized body of supply chain data to power a more resilient, more secure, and more sustainable model of global commerce. Our customers connect to the Altana network to build resilience for critical industries and infrastructure, automate and safeguard...