Customer Reliability Engineer

4 weeks ago


New York, New York, United States Replicated Full time

Replicated is a Commercial Software Distribution Platform. Replicated helps software vendors distribute their applications into self-hosted environments like VPC, on-prem, air gap, and more. With a suite of tools ranging from installation, to testing, to licensing and support, Replicated is the best way to operationalize and scale the distribution of Kubernetes applications into any enterprise environment.

Our customers include KNIME, Puppet, Smartbear, BigID, Swimlane, and many other fast-growing enterprise software vendors.

Replicated is committed to cultivating an efficient, respectful workplace. We know that innovation thrives on teams where diverse points of view come together to solve hard problems in ways that are just now possible. As such, we explicitly seek people that bring diverse life experiences, diverse educational backgrounds, diverse cultures, and diverse work experiences.

We are fully remote and plan to stay that way We're open to any state in the US. In addition, for some roles, we're open to candidates in Canada, the UK, Australia, and New Zealand.

Replicated is expanding its global Customer Reliability Engineering (CRE) team, a group of dedicated engineers focused on helping our vendors successfully deliver and support Kubernetes applications in customer-managed environments. As a CRE, you'll be on the front lines, working directly with customers to solve complex technical challenges related to application deployment, management, and troubleshooting. You'll gain deep expertise in Kubernetes, the Replicated product suite, and the intricacies of customer-managed deployments, including scenarios where cluster installation is required. Due to the team's success and growing responsibilities, we're expanding to meet the increasing demand and serve as a critical interface between our products and their real-world use by customers. This role prioritizes exceptional support and customer success, collaborating closely with senior CREs and product engineers.

This role is perfect for you if you are passionate about problem-solving, enjoy helping people, and thrive on diving deep into technical challenges. You'll leverage your operational knowledge to build best practices and contribute to tooling that empowers both our internal teams and our vendors. This is an excellent opportunity to build a strong foundation in Kubernetes, Linux, and the broader cloud-native ecosystem, while learning from experienced engineers on a successful, growing team.

What you'll be doing:

  • Primary Focus: Provide expert support to customers, resolving issues related to Kubernetes, Linux, and Replicated products. This includes troubleshooting failures, identifying root causes, and implementing solutions. Every day will present new and unique challenges.
  • Enable Customer Success: Work proactively with customers to ensure they are successfully deploying, managing, and scaling their applications using Replicated. This includes providing guidance, best practices, training, and assisting with onboarding new applications.
  • Collaborate with Engineering: Proactively work closely with senior CREs and product engineers to share customer feedback, identify product improvements, and contribute to the overall Replicated product roadmap. While this role doesn't require implementing code changes on day one, you'll be a key contributor in identifying areas for improvement, and the team regularly makes code contributions to enhance our products and tools. As you grow within the team, you'll have opportunities to develop your coding skills and contribute directly to these improvements.
  • Continuous Learning: Invest in your personal and professional growth. Replicated is committed to supporting your development through courses, certifications, and other learning opportunities.
To be successful in this role, you will need to bring:
  • 2+ years experience with Linux system administration
  • 2+ years experience with Kubernetes and Helm
  • Exceptional technical and non-technical communication and interpersonal skills. You must be able to clearly explain complex technical concepts to both technical and non-technical audiences.
  • Strong problem-solving skills and the ability to think critically under pressure.
  • A customer-centric mindset and a genuine desire to help others succeed.
Nice to haves:
  • Experience with CNCF tools
  • Familiarity with Go and the ability to debug Go programs
  • Customer facing experience
Note: This role does include some on-call support coverage. While we do our best to optimize for timezones and working hours, our global team is expanding to ensure we are available for our customers when they need us.

In your first 30 Days:

Immerse Yourself: Dedicate yourself to learning about Replicated - the company, the global CRE team, our products, and our customers (vendors).
  • Hands-on Training: Complete comprehensive hands-on training with the Replicated platform, working through a structured onboarding checklist.
  • Team Connections: Meet with team members across Replicated, including senior CREs, product engineers, and other departments, to build relationships and understand different perspectives.
  • Onboarding Improvement: As you go through the onboarding process, actively identify areas for improvement and suggest changes to make it even better for future CREs.
  • Active Support Participation: Begin working on real support cases from the queue, with direct oversight and guidance from senior CREs. This hands-on approach will accelerate your learning and understanding of customer issues and troubleshooting techniques.
In your first 60 days:
  • Deeper Support Immersion: Continue working on support cases, increasing the complexity and variety of issues you handle. Focus on understanding the "why" behind customer problems and the solutions implemented.
  • Process Improvement: Proactively suggest improvements to the support process, both technical (e.g., tooling, diagnostics) and procedural (e.g., communication workflows, escalation paths).
  • Product Knowledge Expansion: Deepen your understanding of how Replicated's products are developed, how different services interact, and how they are used in customer-managed environments.
  • Vendor Interaction: Begin to participate in some supervised customer interactions, gradually taking on more responsibility under the guidance of senior CREs.
  • Documentation Review: Review existing support documentation and training materials, identifying areas for updates or improvements.
In your first 90 days:
  • Independent Support: Take on full responsibility for handling support issues from the queue, working independently to diagnose, resolve, and prevent recurrence.
  • On-Call Rotation: Join the on-call rotation, providing 24/7 support coverage (primarily weekends due to the global team) for specific Replicated products. Remember, you're never alone - the team is always available to support you.
  • Customer Success Engagement: Begin actively participating in proactive customer success activities, such as assisting with onboarding new applications or providing best-practice guidance.
  • Feedback Loop: Become a key contributor to the feedback loop between customers and engineering, sharing insights and identifying areas for product improvement.
  • Continued Learning: Continue to invest in your personal and professional growth, leveraging Replicated's resources (like the curiosity budget) to expand your skills in Kubernetes, Linux, and other relevant technologies. Begin exploring opportunities to develop your Go coding skills.
At Replicated, we value our teammates as individuals who are stronger together. We offer a robust pay and benefits package that rewards employees for their contributions to our success, supports their well-being, and helps all of us create a great remote work environment.
In the US, the salary range for this role is as follows: $140,000 - 184,000.

For team members outside of the US, our salary ranges are at localized rates for the countries we support. This is dependent on several factors, including level, qualifications, and experience. We also offer stock options, a strong health insurance package, as well as a unique home office allowance & a professional development budget. An overview is on our careers page here: https://www.replicated.com/careers/

We invest in our team and love candidates who are eager to learn and grow. We have a fantastic team of highly collaborative individuals who enjoy learning, growing, and mentoring others.

OUR CORE VALUES

Care Deeply: Care deeply about the work that you do. Because of that you are constantly learning and willing to go out on a limb, challenge assumptions, go back to first principles, etc.

Longterm: Treat every interaction as part of a 30 year relationship, you'll see everyone down the road again as customers, partners, coworkers, etc.

Curious: We're always learning and we approach everyone and every problem with curiosity. When needed we challenge assumptions, and go back to first principles.

BENEFITS

We offer strong benefits to help you stay healthy and productive. For the US, our benefits are listed below:
  • Health/Dental/Vision
  • Life/AD&D
  • LTD/STD
  • FSA
  • 401K
  • Stock options
  • Partner perk programs
  • Generous time off, we expect you to take a minimum of 3 weeks of per year
  • Laptop+accessories you need to get set up
  • Generous home office set up allowance or co-working space allowance - up to $10,000 per year
  • Curiosity Budget to help you keep learning and growing
Replicated is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We encourage applicants of all backgrounds and we work to make sure that all team members have an equal opportunity to succeed.

We do not accept unsolicited assistance from any headhunters, recruitment firms or any other third party for any of our job openings. Any unsolicited resumes sent from anyone other than the candidate, in any format, to any person at Replicated, will be considered Replicated property. Replicated will NOT pay a fee for any placement resulting from the receipt of an unsolicited resume.

#LI-Remote
  • Reliability Engineer

    3 weeks ago


    New York, New York, United States C&W Services Full time

    This range is provided by C&W Services. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Base pay range $120,000.00/yr - $135,000.00/yrSenior Recruiter @ C&W Services | Building High-Performing Teams Reliability, Maintenance, and Engineering (RME) is hiring for Reliability Program ManagersAt Amazon we...


  • New York, New York, United States GlobalFoundries Full time

    At GlobalFoundries, we're pushing the boundaries of what's possible in the world of semiconductors. As a Senior Reliability Engineer, you'll play a critical role in ensuring the reliability and performance of our technologies.Job Summary:We're seeking a highly skilled engineer to join our team as a Senior Reliability Engineer. In this role, you'll be...


  • New York, New York, United States Replicated Full time

    Replicated is a Commercial Software Distribution Platform. Replicated helps software vendors distribute their applications into self-hosted environments like VPC, on-prem, air gap, and more. With a suite of tools ranging from installation, to testing, to licensing and support, Replicated is the best way to operationalize and scale the distribution of...


  • New York, New York, United States GlobalFoundries Full time

    We are seeking a highly skilled individual to join our team as an Engineer for Device Reliability. In this role, you will be responsible for providing reliability analysis and lab support for all company technologies and customer issues.Key Responsibilities:Provide reliability analysis and lab support for all company technologies and customer issuesDevelop...


  • New York, New York, United States Uniform Color Full time

    Job Description Job Description Summary: Responsible for working with the maintenance team, engineering and plant managers to implement safe maintenance practices, increase equipment reliability and insure availability of critical spare parts. Identify opportunities to reduce cost and synergies between plants. Develop relationships with OEM's and equipment...


  • New York, New York, United States MongoDB Full time

    MongoDB is a leader in the database industry, empowering innovators to create and disrupt markets with its cutting-edge technology. We are seeking a Reliability Engineering Director to lead our reliability engineering efforts.As a key member of our team, you will be responsible for designing and implementing reliable systems and infrastructure, including...


  • New York, New York, United States Dune Security Full time

    About the RoleWe are looking for a talented Senior Site Reliability Engineer to lead our reliability efforts. The successful candidate will have a proven track record of designing and implementing scalable infrastructure, optimizing performance, and ensuring high availability.ResponsibilitiesDevelop and maintain a scalable and reliable infrastructure to...


  • New York, New York, United States Uniform Color Full time

    ***** Responsible for working with the maintenance team, engineering and plant managers to implement safe maintenance practices, increase equipment reliability and insure availability of critical spare parts.  Identify opportunities to reduce cost and synergies between plants.  Develop relationships with OEMs and equipment vendors.  Responsible for the...


  • New York, New York, United States Alchemy Full time

    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers the powerful APIs, SDKs, and tools necessary to build and scale onchain apps and rollups. Our infrastructure powers 70% of the top web3 teams, 90%+ of web2...


  • New York, New York, United States GLOBALFOUNDRIES Full time

    **About GlobalFoundries**GlobalFoundries is a leading full-service semiconductor foundry providing a unique combination of design, development, and fabrication services to some of the world's most inspired technology companies. With a global manufacturing footprint spanning three continents, GlobalFoundries makes possible the technologies and systems that...


  • New York, New York, United States Braze Full time

    Braze is a leading customer engagement platform that powers lasting connections between consumers and brands they love. We allow any marketer to collect and take action on any amount of data from any source, so they can creatively engage with customers in real time, across channels from one platform.As a Site Reliability Engineer at Braze, you will...

  • Reliability Engineer

    3 weeks ago


    New York, New York, United States Cushman Wakefield Multifamily Full time

    Job Title Reliability Engineer Job Description Summary Job Description Our Purpose:At C&W Services, we believe that Better Never Settles. We are committed to fostering a positive impact globally by empowering extraordinary people to deliver remarkable results. Join our team and make a difference.C&W Services provides compelling benefits, including:A...


  • New York, New York, United States Braze Full time

    We're looking for an experienced Engineering Manager to join our Database Reliability Engineering (DBRE) team. As a member of this team, you will partner with cross-functional teams to architect scalable, reliable systems that meet strict enterprise-grade SLAs. You'll ensure that Braze can deliver exceptional customer experiences by building products that...


  • New York, New York, United States Grafbase, Inc. Full time

    We are looking for a Site Reliability Engineer to join our Engineering team. As an SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our systems and services. You will collaborate, design, implement, and maintain infrastructure and automation solutions, supporting the continuous improvement of our platform's...


  • New York, New York, United States C&W Services Full time

    At C&W Services, we're working to be the most customer-centric company on earth. To achieve this goal, we need talented individuals like you to join our team.Job OverviewCoordinate and plan work activities for inventory control and maintenance planning technicians to accomplish goals and objectives of North America Customer Fulfillment teams.Act as...


  • New York, New York, United States J.D. Irving Full time

    We currently have an exciting opportunity within our mill for a Maintenance Reliability Engineer . In addition to formal training, development and responsibilities in functional and technical skills. Reliability Department team members also get exposure to development in six sigma methodology including green belt projects. The successful candidate will be...


  • New York, New York, United States Intuition Machines, Inc. Full time

    Intuition Machines uses AI/ML to build enterprise security products. We apply our research to systems that serve hundreds of millions of people, with a team distributed around the world. You are probably familiar with our best-known product, the hCaptcha security suite. Our approach is simple: low overhead, small teams, and rapid iteration. As a Site...


  • New York, New York, United States FirstEnergy Full time

    About the Opportunity This is an open position with FirstEnergy Service Co., a subsidiary of FirstEnergy Corp. [FEU]. The reporting location can be any of the FE facilities. The Candidate for this position is considered mobile, with most of the week working from home and the possibility of traveling to an FE facility up to two days per week. This...


  • New York, New York, United States WorksHub Full time

    Company OverviewWe're a leading cryptocurrency exchange operating in over 70 countries, pushing the boundaries of financial infrastructure with a focus on reliability, scalability, and engineering excellence.Job DescriptionWe're seeking a Staff Site Reliability Engineer to lead our engineering teams in implementing modern DevOps practices. You'll play a...


  • New York, New York, United States Two Sigma Investments, LLC Full time

    Position SummaryTwo Sigma is a financial sciences company, combining data analysis, invention, and rigorous inquiry to help solve the toughest challenges in investment management, insurance technology, securities, private equity, and venture capital.Our team of scientists, technologists, and academics looks beyond the traditional to develop creative...