Senior Site Reliability Engineer
2 days ago
We are expanding our leadership into datacenter networking with ethernet switches, NICs, and DPUs. Our team is responsible for designing and operating large-scale GPU compute clusters that power all AI research across NVIDIA.
Key Responsibilities:- Design and implement state-of-the-art GPU compute clusters
- Optimize cluster operations for maximum reliability, efficiency, and performance
- Drive foundational improvements and automation to enhance researcher productivity
- Tackle strategic challenges in large-scale, high-performance computing environments
- Bachelor's degree in Computer Science, Electrical Engineering, or related field
- Proven experience in site reliability engineering for high-performance computing environments
- Deep understanding of GPU computing and AI infrastructure
- Passion for solving complex technical challenges and optimizing system performance
- Competitive salary and comprehensive benefits package
- Opportunity to work with a world-class engineering team
- Chance to contribute to cutting-edge AI and HPC research
-
Senior Site Reliability Engineer
1 week ago
Santa Clara, California, United States NVIDIA Full timeJob Title: Senior Site Reliability EngineerNVIDIA is a leader in AI, machine learning, and datacenter acceleration. Our company is expanding its leadership into datacenter networking with ethernet switches, NICs, and DPUs. We have continuously reinvented ourselves over two decades.Our invention of the GPU in 1999 sparked the growth of the PC gaming market,...
-
Senior Site Reliability Engineer
4 days ago
Santa Clara, California, United States NVIDIA Full timeJob Title: Senior Site Reliability EngineerNVIDIA is a leader in AI, machine learning, and datacenter acceleration. Our company is expanding its leadership into datacenter networking with ethernet switches, NICs, and DPUs. We have continuously reinvented ourselves over two decades.Our invention of the GPU in 1999 sparked the growth of the PC gaming market,...
-
Senior Site Reliability Engineer
1 month ago
Santa Clara, California, United States ServiceNow Full timeCompany OverviewAt ServiceNow, we harness technology to create a better world for everyone, driven by our talented workforce. We prioritize speed and innovation to meet the demands of our customers and communities.Joining ServiceNow means becoming part of a dynamic team of innovators who possess a relentless curiosity and a commitment to creativity.We...
-
Senior Staff Site Reliability Engineer
18 hours ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.Key ResponsibilitiesDevelop expertise in new technologies and contribute to the success of SRE and...
-
Senior Staff Site Reliability Engineer
7 days ago
Santa Clara, California, United States Palo Alto Networks Full timeJob DescriptionPalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key Responsibilities:Contribute to the success of SRE and DevOps teamsDevelop expertise...
-
Senior Site Reliability Engineer
1 month ago
Santa Clara, California, United States ServiceNow Full timeCompany OverviewAt ServiceNow, we harness technology to enhance global operations, and our dedicated workforce makes it all possible. We operate swiftly because the world demands it, innovating uniquely for our clients and communities.By becoming part of ServiceNow, you join a dynamic team of innovators who possess a relentless curiosity and a passion for...
-
Site Reliability Engineer
3 weeks ago
Santa Clara, California, United States Diverse Lynx Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based applications and infrastructure.Key ResponsibilitiesDesign, implement, and maintain cloud infrastructure on...
-
Senior Site Reliability Engineer
1 day ago
Santa Clara, California, United States Nvidia Full timeJob Title: Senior Site Reliability Engineer - HPC StorageNVIDIA is a leader in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. We are seeking a phenomenal Senior Site Reliability Engineer to join our team and play a crucial role in designing, implementing, and optimizing on-prem High-Performance...
-
Site Reliability Engineer
4 days ago
Santa Clara, California, United States Insight Global Full timeSite Reliability EngineerAbout the RoleWe are seeking a seasoned Site Reliability Engineer to join our team at Insight Global. As a key member of our Infrastructure, Planning and Processes organization, you will be responsible for developing and maintaining sophisticated internal cloud provisioning products.Key ResponsibilitiesCollaborate with various teams,...
-
Site Reliability Engineer
7 days ago
Santa Clara, California, United States Diverse Lynx Full timeJob DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud...
-
Site Reliability Engineer
6 days ago
Santa Clara, California, United States Syntricate Technologies Full timeJob DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team at Syntricate Technologies. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key ResponsibilitiesDesign, implement, and maintain cloud infrastructure on AWS,...
-
Principal Site Reliability Engineer
7 days ago
Santa Clara, California, United States Palo Alto Networks Full timeJob Title: Principal Site Reliability EngineerPalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.About the RoleWe are looking for a seasoned engineer with expertise in...
-
Principal Site Reliability Engineer
2 days ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a Principal Site Reliability Engineer, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure. You will work closely with developers, researchers, data scientists, and security experts to ensure...
-
Principal Site Reliability Engineer
2 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Palo Alto Networks. As a Site Reliability Engineer, you will play a critical role in designing, building, and maintaining scalable and reliable infrastructure for our FedRAMP SASE product portfolio.Key ResponsibilitiesDesign and implement scalable and reliable...
-
Principal Site Reliability Engineer
1 day ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a Principal Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining scalable and reliable infrastructure to support our mission-critical platforms.Key ResponsibilitiesDesign and implement scalable and...
-
Principal Site Reliability Engineer
1 week ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Palo Alto Networks. As a Site Reliability Engineer, you will play a critical role in designing, building, and maintaining scalable and reliable infrastructure for our FedRAMP SASE product portfolio.Key ResponsibilitiesDesign and implement scalable and reliable...
-
Principal Site Reliability Engineer
7 days ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Palo Alto Networks. As a Site Reliability Engineer, you will play a critical role in designing, building, and maintaining scalable and reliable infrastructure for our FedRAMP SASE product portfolio.Key ResponsibilitiesDesign and implement scalable and reliable...
-
Cloud Site Reliability Engineer
7 days ago
Santa Clara, California, United States Centrify Corporation Full timeCloud Site Reliability EngineerAt Centrify Corporation, we're seeking a skilled Cloud Site Reliability Engineer to join our Cloud DevOps team. As a key member of our operations team, you'll play a critical role in ensuring the uptime and delivery of our cloud-based services.Key Responsibilities:Manage our cloud application using DevOps and Agile practices to...
-
Senior Reliability Engineer
4 days ago
Santa Clara, California, United States Omni Vision Inc Full timeJob Title: Senior Reliability EngineerOmni Vision Inc is seeking a highly skilled Senior Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for ensuring the quality and reliability of our CMOS Image Sensor products.Key Responsibilities:Review reliability qualification testing results and determine whether...
-
Principal Site Reliability Engineer
3 days ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.Key ResponsibilitiesContribute to the success of SRE and DevOps teamsDevelop expertise in new...