Customer Reliability Engineer

2 weeks ago


New York, New York, United States Replicated Full time

Replicated is a Commercial Software Distribution Platform. Replicated helps software vendors distribute their applications into self-hosted environments like VPC, on-prem, air gap, and more. With a suite of tools ranging from installation, to testing, to licensing and support, Replicated is the best way to operationalize and scale the distribution of Kubernetes applications into any enterprise environment.
Our customers include KNIME, Puppet, Smartbear, BigID, Swimlane, and many other fast-growing enterprise software vendors.
Replicated is committed to cultivating an efficient, respectful workplace. We know that innovation thrives on teams where diverse points of view come together to solve hard problems in ways that are just now possible. As such, we explicitly seek people that bring diverse life experiences, diverse educational backgrounds, diverse cultures, and diverse work experiences.
We are fully remote and plan to stay that way We're open to any state in the US. In addition, for some roles, we're open to candidates in Canada, the UK, Australia, and New Zealand.
Replicated is expanding its global Customer Reliability Engineering (CRE) team, a group of dedicated engineers focused on helping our vendors successfully deliver and support Kubernetes applications in customer-managed environments. As a CRE, you'll be on the front lines, working directly with customers to solve complex technical challenges related to application deployment, management, and troubleshooting. You'll gain deep expertise in Kubernetes, the Replicated product suite, and the intricacies of customer-managed deployments, including scenarios where cluster installation is required. Due to the team's success and growing responsibilities, we're expanding to meet the increasing demand and serve as a critical interface between our products and their real-world use by customers. This role prioritizes exceptional support and customer success, collaborating closely with senior CREs and product engineers.
This role is perfect for you if you are passionate about problem-solving, enjoy helping people, and thrive on diving deep into technical challenges. You'll leverage your operational knowledge to build best practices and contribute to tooling that empowers both our internal teams and our vendors. This is an excellent opportunity to build a strong foundation in Kubernetes, Linux, and the broader cloud-native ecosystem, while learning from experienced engineers on a successful, growing team.
What you'll be doing: Primary Focus: Provide expert support to customers, resolving issues related to Kubernetes, Linux, and Replicated products. This includes troubleshooting failures, identifying root causes, and implementing solutions. Every day will present new and unique challenges.

Enable Customer Success: Work proactively with customers to ensure they are successfully deploying, managing, and scaling their applications using Replicated. This includes providing guidance, best practices, training, and assisting with onboarding new applications.

Collaborate with Engineering: Proactively work closely with senior CREs and product engineers to share customer feedback, identify product improvements, and contribute to the overall Replicated product roadmap. While this role doesn't require implementing code changes on day one, you'll be a key contributor in identifying areas for improvement, and the team regularly makes code contributions to enhance our products and tools. As you grow within the team, you'll have opportunities to develop your coding skills and contribute directly to these improvements.

Continuous Learning: Invest in your personal and professional growth. Replicated is committed to supporting your development through courses, certifications, and other learning opportunities.

To be successful in this role, you will need to bring: 2+ years experience with Linux system administration

2+ years experience with Kubernetes and Helm

Exceptional technical and non-technical communication and interpersonal skills. You must be able to clearly explain complex technical concepts to both technical and non-technical audiences.

Strong problem-solving skills and the ability to think critically under pressure.

A customer-centric mindset and a genuine desire to help others succeed.

Nice to haves: Experience with CNCF tools

Familiarity with Go and the ability to debug Go programs

Customer facing experience

Note: This role does include some on-call support coverage. While we do our best to optimize for timezones and working hours, our global team is expanding to ensure we are available for our customers when they need us.
In your first 30 Days: Immerse Yourself: Dedicate yourself to learning about Replicated - the company, the global CRE team, our products, and our customers (vendors).

Hands-on Training: Complete comprehensive hands-on training with the Replicated platform, working through a structured onboarding checklist.

Team Connections: Meet with team members across Replicated, including senior CREs, product engineers, and other departments, to build relationships and understand different perspectives.

Onboarding Improvement: As you go through the onboarding process, actively identify areas for improvement and suggest changes to make it even better for future CREs.

Active Support Participation: Begin working on real support cases from the queue, with direct oversight and guidance from senior CREs. This hands-on approach will accelerate your learning and understanding of customer issues and troubleshooting techniques.

In your first 60 days: Deeper Support Immersion: Continue working on support cases, increasing the complexity and variety of issues you handle. Focus on understanding the "why" behind customer problems and the solutions implemented.

Process Improvement: Proactively suggest improvements to the support process, both technical (e.g., tooling, diagnostics) and procedural (e.g., communication workflows, escalation paths).

Product Knowledge Expansion: Deepen your understanding of how Replicated's products are developed, how different services interact, and how they are used in customer-managed environments.

Vendor Interaction: Begin to participate in some supervised customer interactions, gradually taking on more responsibility under the guidance of senior CREs.

Documentation Review: Review existing support documentation and training materials, identifying areas for updates or improvements.

In your first 90 days: Independent Support: Take on full responsibility for handling support issues from the queue, working independently to diagnose, resolve, and prevent recurrence.

On-Call Rotation: Join the on-call rotation, providing 24/7 support coverage (primarily weekends due to the global team) for specific Replicated products. Remember, you're never alone - the team is always available to support you.

Customer Success Engagement: Begin actively participating in proactive customer success activities, such as assisting with onboarding new applications or providing best-practice guidance.

Feedback Loop: Become a key contributor to the feedback loop between customers and engineering, sharing insights and identifying areas for product improvement.

Continued Learning: Continue to invest in your personal and professional growth, leveraging Replicated's resources (like the curiosity budget) to expand your skills in Kubernetes, Linux, and other relevant technologies. Begin exploring opportunities to develop your Go coding skills.

At Replicated, we value our teammates as individuals who are stronger together. We offer a robust pay and benefits package that rewards employees for their contributions to our success, supports their well-being, and helps all of us create a great remote work environment.
In the US, the salary range for this role is as follows: $140,000 - 184,000.
For team members outside of the US, our salary ranges are at localized rates for the countries we support. This is dependent on several factors, including level, qualifications, and experience. We also offer stock options, a strong health insurance package, as well as a unique home office allowance & a professional development budget. An overview is on our careers page here: https://www.replicated.com/careers/
We invest in our team and love candidates who are eager to learn and grow. We have a fantastic team of highly collaborative individuals who enjoy learning, growing, and mentoring others.
OUR CORE VALUES Care Deeply: Care deeply about the work that you do. Because of that you are constantly learning and willing to go out on a limb, challenge assumptions, go back to first principles, etc.

Longterm: Treat every interaction as part of a 30 year relationship, you'll see everyone down the road again as customers, partners, coworkers, etc.

Curious: We're always learning and we approach everyone and every problem with curiosity. When needed we challenge assumptions, and go back to first principles.

BENEFITS We offer strong benefits to help you stay healthy and productive. For the US, our benefits are listed below:
Health/Dental/Vision

Life/AD&D

LTD/STD

FSA

401K

Stock options

Partner perk programs

Generous time off, we expect you to take a minimum of 3 weeks of per year

Laptop+accessories you need to get set up

Generous home office set up allowance or co-working space allowance - up to $10,000 per year

Curiosity Budget to help you keep learning and growing

Replicated is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We encourage applicants of all backgrounds and we work to make sure that all team members have an equal opportunity to succeed.
We do not accept unsolicited assistance from any headhunters, recruitment firms or any other third party for any of our job openings. Any unsolicited resumes sent from anyone other than the candidate, in any format, to any person at Replicated, will be considered Replicated property. Replicated will NOT pay a fee for any placement resulting from the receipt of an unsolicited resume.
#LI-Remote

#J-18808-Ljbffr


  • Reliability Engineer

    2 weeks ago


    New York, New York, United States C&W Services Full time

    This range is provided by C&W Services. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Base pay range $120,000.00/yr - $135,000.00/yrSenior Recruiter @ C&W Services | Building High-Performing Teams Reliability, Maintenance, and Engineering (RME) is hiring for Reliability Program ManagersAt Amazon we...

  • Reliability Engineer

    4 weeks ago


    New York, New York, United States eTeam Full time

    Location - San Jose C This position involves working closely with Hardware Engineering, Product Management, Technical Support Engineering, Global Customer Support, Technical Operations and Supply Chain Management. Key Responsibilities Establish and maintain controls and document procedures related to NPI product quality and reliability Aid Development and...


  • New York, New York, United States Replicated Full time

    Replicated is a Commercial Software Distribution Platform. Replicated helps software vendors distribute their applications into self-hosted environments like VPC, on-prem, air gap, and more. With a suite of tools ranging from installation, to testing, to licensing and support, Replicated is the best way to operationalize and scale the distribution of...


  • New York, New York, United States Replicated Full time

    Replicated is a Commercial Software Distribution Platform. Replicated helps software vendors distribute their applications into self-hosted environments like VPC, on-prem, air gap, and more. With a suite of tools ranging from installation, to testing, to licensing and support, Replicated is the best way to operationalize and scale the distribution of...


  • New York, New York, United States DuPont de Nemours, Inc. Full time

    At DuPont, our purpose is to empower the world with essential innovations to thrive. We work on things that matter. Whether it's providing clean water to more than a billion people on the planet, producing materials that are essential in everyday technology devices from smartphones to electric vehicles, or protecting workers around the world. Discover the...


  • New York, New York, United States GlobalFoundries Full time

    Company OverviewGlobalFoundries is a leading full-service semiconductor foundry that provides design, development, and fabrication services to top technology companies. Our global manufacturing footprint spans three continents, enabling us to transform industries and give customers the power to shape their markets.About the JobWe are seeking a highly skilled...


  • New York, New York, United States Dune Security Full time

    About the RoleWe are looking for a talented Senior Site Reliability Engineer to lead our reliability efforts. The successful candidate will have a proven track record of designing and implementing scalable infrastructure, optimizing performance, and ensuring high availability.ResponsibilitiesDevelop and maintain a scalable and reliable infrastructure to...


  • New York, New York, United States Uniform Color Full time

    ***** Responsible for working with the maintenance team, engineering and plant managers to implement safe maintenance practices, increase equipment reliability and insure availability of critical spare parts.  Identify opportunities to reduce cost and synergies between plants.  Develop relationships with OEMs and equipment vendors.  Responsible for the...


  • New York, New York, United States GlobalFoundries Full time

    Company OverviewGlobalFoundries is a leading full-service semiconductor foundry that provides a unique combination of design, development, and fabrication services to some of the world's most inspired technology companies. With a global manufacturing footprint spanning three continents, GlobalFoundries makes possible the technologies and systems that...


  • New York, New York, United States Alchemy Full time

    Our mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers the powerful APIs, SDKs, and tools necessary to build and scale onchain apps and rollups. Our infrastructure powers 70% of the top web3 teams, 90%+ of web2...


  • New York, New York, United States GLOBALFOUNDRIES Full time

    **About GlobalFoundries**GlobalFoundries is a leading full-service semiconductor foundry providing a unique combination of design, development, and fabrication services to some of the world's most inspired technology companies. With a global manufacturing footprint spanning three continents, GlobalFoundries makes possible the technologies and systems that...

  • Reliability Engineer

    2 weeks ago


    New York, New York, United States Cushman Wakefield Multifamily Full time

    Job Title Reliability Engineer Job Description Summary Job Description Our Purpose:At C&W Services, we believe that Better Never Settles. We are committed to fostering a positive impact globally by empowering extraordinary people to deliver remarkable results. Join our team and make a difference.C&W Services provides compelling benefits, including:A...


  • New York, New York, United States Dune Security Full time

    Job DescriptionAs a Senior SRE / DevOps Engineer at Dune Security, you will play a critical role in ensuring our platform's stability, scalability, and security. You will own the reliability of our infrastructure, optimize performance, and implement robust security measures to mitigate potential threats. You will work closely with engineering, security, and...


  • New York, New York, United States Intuition Machines, Inc. Full time

    Intuition Machines uses AI/ML to build enterprise security products. We apply our research to systems that serve hundreds of millions of people, with a team distributed around the world. You are probably familiar with our best-known product, the hCaptcha security suite. Our approach is simple: low overhead, small teams, and rapid iteration. As a Site...


  • New York, New York, United States Grafbase, Inc. Full time

    We are looking for a Site Reliability Engineer to join our Engineering team. As an SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our systems and services. You will collaborate, design, implement, and maintain infrastructure and automation solutions, supporting the continuous improvement of our platform's...


  • New York, New York, United States Grafbase, Inc. Full time

    We are looking for a Site Reliability Engineer to join our Engineering team. As an SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our systems and services. You will collaborate, design, implement, and maintain infrastructure and automation solutions, supporting the continuous improvement of our platform's...


  • New York, New York, United States Intuition Machines, Inc. Full time

    Intuition Machines uses AI/ML to build enterprise security products. We apply our research to systems that serve hundreds of millions of people, with a team distributed around the world. You are probably familiar with our best-known product, the hCaptcha security suite. Our approach is simple: low overhead, small teams, and rapid iteration. As a Site...


  • New York, New York, United States J.D. Irving Full time

    We currently have an exciting opportunity within our mill for a Maintenance Reliability Engineer . In addition to formal training, development and responsibilities in functional and technical skills. Reliability Department team members also get exposure to development in six sigma methodology including green belt projects. The successful candidate will be...


  • New York, New York, United States FirstEnergy Full time

    About the Opportunity This is an open position with FirstEnergy Service Co., a subsidiary of FirstEnergy Corp. [FEU]. The reporting location can be any of the FE facilities. The Candidate for this position is considered mobile, with most of the week working from home and the possibility of traveling to an FE facility up to two days per week. This...


  • New York, New York, United States Amalgamated Sugar Full time

    Join to apply for the Reliability Engineer 2 role at Amalgamated Sugar Company .Amalgamated Sugar Company is seeking a Reliability Engineer at our Mini Cassia factory in Paul, Idaho. Reporting to the Maintenance Manager, the Reliability Engineer is responsible for overall execution and leadership of the Reliability team in improving reliability performance...