Reliability Engineering Expert
5 days ago
Pure Storage, Inc. is a leading technology company dedicated to delivering innovative solutions for data storage and management.
Job DescriptionWe are seeking an experienced Observability and Site Reliability Engineer to join our team in Santa Clara, CA. As an Observability and SRE Engineer , you'll be responsible for managing and enhancing the observability of our systems, troubleshooting complex issues, and leading post-incident reviews. Your work will directly impact our ability to respond swiftly to incidents, minimize downtime, and improve customer satisfaction.
Key Responsibilities:- Customer Escalation Management:
- Act as the primary technical resource for high-impact customer escalations, working to diagnose, troubleshoot, and resolve incidents.
- Coordinate with customer support and engineering teams to ensure issues are resolved quickly and accurately.
- Serve as a technical point of contact during incidents, communicating status and resolution plans to relevant stakeholders.
- Observability and Monitoring:
- Develop and maintain dashboards, alerts, and logging systems to track product performance.
- Improve the observability and visibility of features through enhancements to monitoring, logging, and alerting.
- Establish SLAs, SLIs, and SLOs to measure and ensure the reliability of product and proactively prevent escalations and sev-1's.
- Look for trends on features causing reliability issues.
- Collaboration and Communication:
- Work cross-functionally with development, product, and support teams to enhance system reliability and customer experience.
- Provide feedback to development teams on areas of improvement for code stability and reliability.
- Mentor other engineers on best practices in observability and reliability engineering.
We're looking for someone with 7+ years of experience in SRE or a related field, with a strong focus on observability and customer-facing incident response. You should have proficiency in monitoring and observability tools, solid knowledge of programming languages like Python and Go, and experience with cloud platforms and container orchestration tools.
The estimated salary range for this role is $207,000 – $312,000 annually, depending on location and level of experience. This role may also be eligible for incentive pay and/or equity. We offer a comprehensive benefits package, including flexible time off, wellness resources, and company-sponsored team events.
Please note that we require candidates to work from our Santa Clara, CA office, unless approved otherwise. If you're passionate about delivering exceptional customer experiences and driving business growth, we encourage you to apply for this exciting opportunity.
-
Reliability Systems Expert
2 weeks ago
Santa Clara, California, United States OmniVision Technologies Full timeAbout OmniVision TechnologiesWe are a leading manufacturer of CMOS Image Sensors based in Santa Clara, CA.Job SummaryWe are seeking a highly skilled Reliability Systems Expert to join our team. In this role, you will be responsible for ensuring the high-quality and reliability of our image sensors.ResponsibilitiesDevelop and implement reliability testing...
-
Santa Clara, California, United States Palo Alto Networks Full timeAbout the JobPalo Alto Networks is seeking an experienced Reliability Engineer to join our team. The ideal candidate will have a strong background in reliability engineering and networking products.The successful candidate will be responsible for establishing controls and document procedures related to NPI product quality and reliability, aiding Development...
-
Reliability Engineering Manager
1 week ago
Santa Clara, California, United States Pure Storage, Inc. Full timeDrive product reliability at scale as a Software Engineering Manager for Pure Storage's Fleet Reliability Engineering team. Lead a group dedicated to ensuring the highest reliability of customer FlashBlade systems.Your mission will be to manage and improve systems and processes that monitor and respond to fleet reliability issues—whether through reactive,...
-
Design Quality and Reliability Expert
2 days ago
Santa Clara, California, United States Intel Full timeIntel is at the forefront of innovation, driving progress in the world of technology. We are committed to enriching the lives of every person on earth.The Client Computing Group (CCG) is responsible for driving business strategy and product development for Intel's PC products and platforms. This role will be part of the Pre-Silicon Design Quality and...
-
Software Reliability Engineer
2 days ago
Santa Clara, California, United States Roche Holdings Inc. Full timeResponsibilitiesThe Principal DevOps Engineer will lead the design automation and perform deployment of various algorithms on dev, test, and production environments.This role requires excellent team player and mentoring skills, with a track record of guiding cross-functional teams to achieve DevOps automation goals and enhance overall productivity.The...
-
Enterprise System Reliability Expert
2 days ago
Santa Clara, California, United States Sustainable Talent Full timeSustainable Talent is partnering with Nvidia to find a skilled Site Reliability Engineer to support their IPP (Infrastructure, Planning and Process) Team. This W-2 full-time contract based in Santa Clara, CA offers a competitive pay rate of $82-92/hr, depending on factors like experience, education, and location.The ideal candidate will have 5+ years of...
-
Santa Clara, California, United States OmniVision Technologies Full timeOmniVision Technologies, a leading CMOS Image Sensor Manufacturer, is seeking a highly skilled Staff Reliability Engineer to join its team in Santa Clara, CA.About the RoleWe are looking for an exceptional engineer with expertise in reliability systems to help us design and develop high-quality image sensors. As a Staff Reliability Engineer, you will play a...
-
Principal Site Reliability Engineer
4 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking an experienced Principal Site Reliability Engineer to join our Cloud Infrastructure team. As a key member of our team, you will be responsible for designing, building, and maintaining scalable and reliable cloud infrastructure to support our mission-critical applications.Key ResponsibilitiesDesign and implement...
-
Santa Clara, California, United States Cryptoware Technologies Inc Full timeJob OverviewCryptoware Technologies Inc is seeking a highly skilled and experienced Site Reliability Engineer to lead the effort of global expansion of our globe-spanning infrastructure.
-
Reliability and Efficiency Expert
1 day ago
Santa Rosa, California, United States LanceSoft Full timeJob DescriptionLanceSoft is seeking an experienced Facilities Equipment Engineer to join our team. As a key member of our engineering team, you will be responsible for ensuring the reliable operation of our facilities equipment, including vacuum process equipment and facilities systems. Your expertise in predictive and preventive maintenance will be...
-
Expert Code Reasoning Engineer
2 weeks ago
Santa Clara, California, United States Amazon Full timeAbout AmazonAmazon is a leader in cloud computing, providing scalable and secure services to customers around the world. Our Utility Computing (UC) organization is responsible for developing innovative products, such as Amazon Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), that set us apart in the industry.Job DescriptionWe are seeking...
-
Expert Embedded Systems Engineer
3 weeks ago
Santa Clara, California, United States Solomon Page Full timeAt Solomon Page, we are seeking a highly skilled Expert Embedded Systems Engineer to join our team. This role offers a unique opportunity to work on cutting-edge projects and contribute to the development of innovative solutions.About the RoleWe are looking for a talented engineer with expertise in embedded systems design, architecture, and development. The...
-
Software Development Engineer
2 days ago
Santa Clara, California, United States Amazon Full timeWe are a passionate team at Amazon working to build a best-in-class healthcare product. We're looking for a full-stack Software Development Engineer focused on Front-end who can lead a motivated team to build new UI engineering solutions.This role requires deep technical expertise and the opportunity to engineer systems and build reliable and secure services...
-
Reliability and Quality Assurance Specialist
2 days ago
Santa Clara, California, United States Intel Full timeJoin us at Intel, where we're shaping the future of technology. We're seeking a highly skilled Reliability and Quality Assurance Specialist to join our team.As a member of the Pre-Silicon Design Quality and Reliability Engineer (Pre-Si QRE) group, you'll support the development of CPU and Hard IPs on the most advanced Intel processes.Key responsibilities...
-
Expert Generative AI Software Engineer
2 weeks ago
Santa Clara, California, United States Promote Project Full timeTransformative Role in Accelerated ComputingPromote Project is seeking a seasoned Expert Generative AI Software Engineer to spearhead the development and deployment of cutting-edge GenAI software solutions based on NVIDIA's pioneering technologies. As part of our esteemed team, you will collaborate with industry leaders and innovators to drive breakthroughs...
-
Timing Expert and Architect
2 weeks ago
Santa Clara, California, United States SiTime Corporation Full timeJob TitleSystem Architect, Synchronization and TimingAbout the RoleThe Time Synchronization System Architect serves as a resident time synchronization expert with a focus on datacenter, AI, 5G, and networking markets and applications. As a senior technical leader, you will leverage your timing expertise to develop deep technical relationships with key...
-
Quality Engineering Manager
3 weeks ago
Santa Clara, California, United States Shockwave Medical Full timeOverviewShockwave Medical, Inc. is a pioneering medical device company that has revolutionized the treatment of complex calcified cardiovascular disease with its innovative Intravascular Lithotripsy (IVL) technology.SalaryThe estimated salary for this position ranges from $144,000 to $180,000 per year, depending on skills, experience, and location.Job...
-
Semiconductor Applications Expert
3 days ago
Santa Clara, California, United States Rohm Semiconductor Full timeSemiconductor Applications ExpertWe are seeking a highly skilled Semiconductor Applications Expert to join our team at ROHM Semiconductor. In this role, you will be responsible for providing technical support and guidance to customers and internal stakeholders related to magnetic and optical semiconductor components.Key Responsibilities:Provide technical...
-
Santa Clara, California, United States Menlo Ventures Full timeAbout the RoleWe are seeking an experienced Senior Software Engineer II to join our team at Carta. In this role, you will have the opportunity to work on complex SaaS products with Service Oriented Architecture (SOA) and microservices.As a Senior Software Engineer II, you will be responsible for defining requirements and building solutions for products that...
-
Cybersecurity Engineer
2 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RoleWe are seeking a talented Cybersecurity Engineer to join our team at Palo Alto Networks. As a key member of our engineering team, you will be responsible for designing and developing scalable microservices used to activate all Palo Alto Networks cloud products.Your primary focus will be on developing and delivering next-generation technologies...