Customer Reliability Engineer
4 weeks ago
Replicated is a Commercial Software Distribution Platform. Replicated helps software vendors distribute their applications into self-hosted environments like VPC, on-prem, air gap, and more. With a suite of tools ranging from installation, to testing, to licensing and support, Replicated is the best way to operationalize and scale the distribution of Kubernetes applications into any enterprise environment.
Our customers include KNIME, Puppet, Smartbear, BigID, Swimlane, and many other fast-growing enterprise software vendors.
Replicated is committed to cultivating an efficient, respectful workplace. We know that innovation thrives on teams where diverse points of view come together to solve hard problems in ways that are just now possible. As such, we explicitly seek people that bring diverse life experiences, diverse educational backgrounds, diverse cultures, and diverse work experiences.
We are fully remote and plan to stay that way We're open to any state in the US. In addition, for some roles, we're open to candidates in Canada, the UK, Australia, and New Zealand.
Replicated is expanding its global Customer Reliability Engineering (CRE) team, a group of dedicated engineers focused on helping our vendors successfully deliver and support Kubernetes applications in customer-managed environments. As a CRE, you'll be on the front lines, working directly with customers to solve complex technical challenges related to application deployment, management, and troubleshooting. You'll gain deep expertise in Kubernetes, the Replicated product suite, and the intricacies of customer-managed deployments, including scenarios where cluster installation is required. Due to the team's success and growing responsibilities, we're expanding to meet the increasing demand and serve as a critical interface between our products and their real-world use by customers. This role prioritizes exceptional support and customer success, collaborating closely with senior CREs and product engineers.
This role is perfect for you if you are passionate about problem-solving, enjoy helping people, and thrive on diving deep into technical challenges. You'll leverage your operational knowledge to build best practices and contribute to tooling that empowers both our internal teams and our vendors. This is an excellent opportunity to build a strong foundation in Kubernetes, Linux, and the broader cloud-native ecosystem, while learning from experienced engineers on a successful, growing team.
What you'll be doing:
- Primary Focus: Provide expert support to customers, resolving issues related to Kubernetes, Linux, and Replicated products. This includes troubleshooting failures, identifying root causes, and implementing solutions. Every day will present new and unique challenges.
- Enable Customer Success: Work proactively with customers to ensure they are successfully deploying, managing, and scaling their applications using Replicated. This includes providing guidance, best practices, training, and assisting with onboarding new applications.
- Collaborate with Engineering: Proactively work closely with senior CREs and product engineers to share customer feedback, identify product improvements, and contribute to the overall Replicated product roadmap. While this role doesn't require implementing code changes on day one, you'll be a key contributor in identifying areas for improvement, and the team regularly makes code contributions to enhance our products and tools. As you grow within the team, you'll have opportunities to develop your coding skills and contribute directly to these improvements.
- Continuous Learning: Invest in your personal and professional growth. Replicated is committed to supporting your development through courses, certifications, and other learning opportunities.
- 2+ years experience with Linux system administration
- 2+ years experience with Kubernetes and Helm
- Exceptional technical and non-technical communication and interpersonal skills. You must be able to clearly explain complex technical concepts to both technical and non-technical audiences.
- Strong problem-solving skills and the ability to think critically under pressure.
- A customer-centric mindset and a genuine desire to help others succeed.
- Experience with CNCF tools
- Familiarity with Go and the ability to debug Go programs
- Customer facing experience
In your first 30 Days:
Immerse Yourself: Dedicate yourself to learning about Replicated - the company, the global CRE team, our products, and our customers (vendors).
- Hands-on Training: Complete comprehensive hands-on training with the Replicated platform, working through a structured onboarding checklist.
- Team Connections: Meet with team members across Replicated, including senior CREs, product engineers, and other departments, to build relationships and understand different perspectives.
- Onboarding Improvement: As you go through the onboarding process, actively identify areas for improvement and suggest changes to make it even better for future CREs.
- Active Support Participation: Begin working on real support cases from the queue, with direct oversight and guidance from senior CREs. This hands-on approach will accelerate your learning and understanding of customer issues and troubleshooting techniques.
- Deeper Support Immersion: Continue working on support cases, increasing the complexity and variety of issues you handle. Focus on understanding the "why" behind customer problems and the solutions implemented.
- Process Improvement: Proactively suggest improvements to the support process, both technical (e.g., tooling, diagnostics) and procedural (e.g., communication workflows, escalation paths).
- Product Knowledge Expansion: Deepen your understanding of how Replicated's products are developed, how different services interact, and how they are used in customer-managed environments.
- Vendor Interaction: Begin to participate in some supervised customer interactions, gradually taking on more responsibility under the guidance of senior CREs.
- Documentation Review: Review existing support documentation and training materials, identifying areas for updates or improvements.
- Independent Support: Take on full responsibility for handling support issues from the queue, working independently to diagnose, resolve, and prevent recurrence.
- On-Call Rotation: Join the on-call rotation, providing 24/7 support coverage (primarily weekends due to the global team) for specific Replicated products. Remember, you're never alone - the team is always available to support you.
- Customer Success Engagement: Begin actively participating in proactive customer success activities, such as assisting with onboarding new applications or providing best-practice guidance.
- Feedback Loop: Become a key contributor to the feedback loop between customers and engineering, sharing insights and identifying areas for product improvement.
- Continued Learning: Continue to invest in your personal and professional growth, leveraging Replicated's resources (like the curiosity budget) to expand your skills in Kubernetes, Linux, and other relevant technologies. Begin exploring opportunities to develop your Go coding skills.
In the US, the salary range for this role is as follows: $140,000 - 184,000.
For team members outside of the US, our salary ranges are at localized rates for the countries we support. This is dependent on several factors, including level, qualifications, and experience. We also offer stock options, a strong health insurance package, as well as a unique home office allowance & a professional development budget. An overview is on our careers page here: https://www.replicated.com/careers/
We invest in our team and love candidates who are eager to learn and grow. We have a fantastic team of highly collaborative individuals who enjoy learning, growing, and mentoring others.
OUR CORE VALUES
Care Deeply: Care deeply about the work that you do. Because of that you are constantly learning and willing to go out on a limb, challenge assumptions, go back to first principles, etc.
Longterm: Treat every interaction as part of a 30 year relationship, you'll see everyone down the road again as customers, partners, coworkers, etc.
Curious: We're always learning and we approach everyone and every problem with curiosity. When needed we challenge assumptions, and go back to first principles.
BENEFITS
We offer strong benefits to help you stay healthy and productive. For the US, our benefits are listed below:
- Health/Dental/Vision
- Life/AD&D
- LTD/STD
- FSA
- 401K
- Stock options
- Partner perk programs
- Generous time off, we expect you to take a minimum of 3 weeks of per year
- Laptop+accessories you need to get set up
- Generous home office set up allowance or co-working space allowance - up to $10,000 per year
- Curiosity Budget to help you keep learning and growing
We do not accept unsolicited assistance from any headhunters, recruitment firms or any other third party for any of our job openings. Any unsolicited resumes sent from anyone other than the candidate, in any format, to any person at Replicated, will be considered Replicated property. Replicated will NOT pay a fee for any placement resulting from the receipt of an unsolicited resume.
#LI-Remote
-
Reliability Engineer
3 weeks ago
New York, New York, United States C&W Services Full timeThis range is provided by C&W Services. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more.Base pay range $120,000.00/yr - $135,000.00/yrSenior Recruiter @ C&W Services | Building High-Performing Teams Reliability, Maintenance, and Engineering (RME) is hiring for Reliability Program ManagersAt Amazon we...
-
Senior Reliability Engineer
23 hours ago
New York, New York, United States GlobalFoundries Full timeAt GlobalFoundries, we're pushing the boundaries of what's possible in the world of semiconductors. As a Senior Reliability Engineer, you'll play a critical role in ensuring the reliability and performance of our technologies.Job Summary:We're seeking a highly skilled engineer to join our team as a Senior Reliability Engineer. In this role, you'll be...
-
Customer Reliability Engineer
3 weeks ago
New York, New York, United States Replicated Full timeReplicated is a Commercial Software Distribution Platform. Replicated helps software vendors distribute their applications into self-hosted environments like VPC, on-prem, air gap, and more. With a suite of tools ranging from installation, to testing, to licensing and support, Replicated is the best way to operationalize and scale the distribution of...
-
Engineer for Device Reliability
11 hours ago
New York, New York, United States GlobalFoundries Full timeWe are seeking a highly skilled individual to join our team as an Engineer for Device Reliability. In this role, you will be responsible for providing reliability analysis and lab support for all company technologies and customer issues.Key Responsibilities:Provide reliability analysis and lab support for all company technologies and customer issuesDevelop...
-
Reliability Engineer
3 days ago
New York, New York, United States Uniform Color Full timeJob Description Job Description Summary: Responsible for working with the maintenance team, engineering and plant managers to implement safe maintenance practices, increase equipment reliability and insure availability of critical spare parts. Identify opportunities to reduce cost and synergies between plants. Develop relationships with OEM's and equipment...
-
Reliability Engineering Director
21 hours ago
New York, New York, United States MongoDB Full timeMongoDB is a leader in the database industry, empowering innovators to create and disrupt markets with its cutting-edge technology. We are seeking a Reliability Engineering Director to lead our reliability engineering efforts.As a key member of our team, you will be responsible for designing and implementing reliable systems and infrastructure, including...
-
Reliability Engineer Lead
4 days ago
New York, New York, United States Dune Security Full timeAbout the RoleWe are looking for a talented Senior Site Reliability Engineer to lead our reliability efforts. The successful candidate will have a proven track record of designing and implementing scalable infrastructure, optimizing performance, and ensuring high availability.ResponsibilitiesDevelop and maintain a scalable and reliable infrastructure to...
-
Reliability Engineer
6 days ago
New York, New York, United States Uniform Color Full time***** Responsible for working with the maintenance team, engineering and plant managers to implement safe maintenance practices, increase equipment reliability and insure availability of critical spare parts. Identify opportunities to reduce cost and synergies between plants. Develop relationships with OEMs and equipment vendors. Responsible for the...
-
Site Reliability Engineer
4 weeks ago
New York, New York, United States Alchemy Full timeOur mission is to bring web3 to a billion people, by providing builders with the tools they need to build exceptional onchain products. Alchemy is the only complete developer platform that offers the powerful APIs, SDKs, and tools necessary to build and scale onchain apps and rollups. Our infrastructure powers 70% of the top web3 teams, 90%+ of web2...
-
Reliability Engineer Specialist
5 days ago
New York, New York, United States GLOBALFOUNDRIES Full time**About GlobalFoundries**GlobalFoundries is a leading full-service semiconductor foundry providing a unique combination of design, development, and fabrication services to some of the world's most inspired technology companies. With a global manufacturing footprint spanning three continents, GlobalFoundries makes possible the technologies and systems that...
-
Reliability Engineering Leader
2 days ago
New York, New York, United States Braze Full timeBraze is a leading customer engagement platform that powers lasting connections between consumers and brands they love. We allow any marketer to collect and take action on any amount of data from any source, so they can creatively engage with customers in real time, across channels from one platform.As a Site Reliability Engineer at Braze, you will...
-
Reliability Engineer
3 weeks ago
New York, New York, United States Cushman Wakefield Multifamily Full timeJob Title Reliability Engineer Job Description Summary Job Description Our Purpose:At C&W Services, we believe that Better Never Settles. We are committed to fostering a positive impact globally by empowering extraordinary people to deliver remarkable results. Join our team and make a difference.C&W Services provides compelling benefits, including:A...
-
Senior Reliability Engineering Manager
1 day ago
New York, New York, United States Braze Full timeWe're looking for an experienced Engineering Manager to join our Database Reliability Engineering (DBRE) team. As a member of this team, you will partner with cross-functional teams to architect scalable, reliable systems that meet strict enterprise-grade SLAs. You'll ensure that Braze can deliver exceptional customer experiences by building products that...
-
Site Reliability Engineer
7 days ago
New York, New York, United States Grafbase, Inc. Full timeWe are looking for a Site Reliability Engineer to join our Engineering team. As an SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our systems and services. You will collaborate, design, implement, and maintain infrastructure and automation solutions, supporting the continuous improvement of our platform's...
-
Reliability Engineering Specialist
3 days ago
New York, New York, United States C&W Services Full timeAt C&W Services, we're working to be the most customer-centric company on earth. To achieve this goal, we need talented individuals like you to join our team.Job OverviewCoordinate and plan work activities for inventory control and maintenance planning technicians to accomplish goals and objectives of North America Customer Fulfillment teams.Act as...
-
Reliability Engineer
1 week ago
New York, New York, United States J.D. Irving Full timeWe currently have an exciting opportunity within our mill for a Maintenance Reliability Engineer . In addition to formal training, development and responsibilities in functional and technical skills. Reliability Department team members also get exposure to development in six sigma methodology including green belt projects. The successful candidate will be...
-
Site Reliability Engineer
2 weeks ago
New York, New York, United States Intuition Machines, Inc. Full timeIntuition Machines uses AI/ML to build enterprise security products. We apply our research to systems that serve hundreds of millions of people, with a team distributed around the world. You are probably familiar with our best-known product, the hCaptcha security suite. Our approach is simple: low overhead, small teams, and rapid iteration. As a Site...
-
Transmission Reliability Engineer
4 days ago
New York, New York, United States FirstEnergy Full timeAbout the Opportunity This is an open position with FirstEnergy Service Co., a subsidiary of FirstEnergy Corp. [FEU]. The reporting location can be any of the FE facilities. The Candidate for this position is considered mobile, with most of the week working from home and the possibility of traveling to an FE facility up to two days per week. This...
-
Cloud Reliability Engineer
3 days ago
New York, New York, United States WorksHub Full timeCompany OverviewWe're a leading cryptocurrency exchange operating in over 70 countries, pushing the boundaries of financial infrastructure with a focus on reliability, scalability, and engineering excellence.Job DescriptionWe're seeking a Staff Site Reliability Engineer to lead our engineering teams in implementing modern DevOps practices. You'll play a...
-
Reliability Engineer
3 days ago
New York, New York, United States Two Sigma Investments, LLC Full timePosition SummaryTwo Sigma is a financial sciences company, combining data analysis, invention, and rigorous inquiry to help solve the toughest challenges in investment management, insurance technology, securities, private equity, and venture capital.Our team of scientists, technologists, and academics looks beyond the traditional to develop creative...