Senior SRE Engineering Leader
3 weeks ago
NVIDIA is a leader in the AI revolution, driving innovation in industries with our cutting-edge GPU technology. Our GPUs power groundbreaking advancements in AI, big data, and deep learning.
We're seeking visionary leaders to join us as Senior SRE Engineering Leader. As a key member of our team, you'll lead our globally distributed clusters, ensuring seamless operations and delivering AI services that drive breakthroughs in life sciences and natural language processing.
As SRE Leader, you'll build and operate large-scale GPU clusters across various cloud providers. You'll design and implement processes, tools, and systems that transform our massive operational experience into an overall improvement to the ecosystem.
Key responsibilities include:
- Managing distributed, multi-location GPU clusters for AI research
- Leading a team of SREs, driving cluster operational excellence and efficiency
- Delivering scalable distributed systems and AI services in fast-paced environments
- Building strong, globally distributed teams and driving technical strategy
- Collaborating across the company to improve the GPU ecosystem for AI use cases
- Solving reliability, efficiency, and productivity challenges for GPU infrastructure
- Defining strategy, managing projects, and driving technical leadership across multiple areas
- Collaborating with internal stakeholders to ensure transparency on budget and operational efficiency
Requirements include:
- 10+ years in engineering management; 3+ in leadership roles
- Bachelor's or Master's in Computer Science or a related field, or equivalent experience
- Experience supporting AI/ML workloads and driving operational standard methodologies
- Strong Unix/Linux knowledge and proficiency in at least two programming languages (Perl, Python, Go)
- Expertise in managing large-scale distributed systems and AI/HPC environments
- Leadership experience, mentoring, and coaching skills
- Ability to quickly learn and integrate new technologies
- Strong collaboration skills across engineering, server, storage, and security teams
NVIDIA offers highly competitive salaries and a comprehensive benefits package. We're a company that values diversity and is committed to fostering a work environment that is inclusive and respectful. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.
-
Senior Network SRE Lead
4 weeks ago
Santa Clara, California, United States Diverse Lynx Full timeJob Title: Senior Network SREWe are seeking a seasoned Senior Network SRE to lead our network infrastructure team in achieving Service Level Objectives (SLOs) and minimizing manual labor.Key Responsibilities:Owning the operational aspect of the network infrastructure, ensuring high availability and reliability.Partnering with architecture, tooling, and...
-
Senior Data Engineer, SRE
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob DescriptionWe are seeking a highly skilled Sr. Data Engineer, SRE to join our team at NVIDIA. As a key member of our data science and reporting team, you will be responsible for designing and delivering high-performance services and libraries, building streaming data pipelines, and partnering with other engineering and business teams to integrate your...
-
Senior Network SRE Lead
4 weeks ago
Santa Clara, California, United States Diverse Lynx Full timeJob Summary:We are seeking a seasoned Network SRE technical lead to help actualize the SRE vision for our network infrastructure. As a key member of our Network Support and SRE team, you will be responsible for owning the operational aspect of the network infrastructure, ensuring its high availability and reliability.Key Responsibilities: Partner with...
-
Senior Cloud Infrastructure Engineer
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob DescriptionNVIDIA is seeking a Senior Site Reliability Engineer to join our AI Efficiency Team. As a key member of this team, you will contribute to the development of infrastructure that powers our innovative AI research.The AI Efficiency Team focuses on optimizing efficiency and resiliency of AI workloads, as well as developing scalable AI and Data...
-
Senior Cloud Reliability Engineer
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeAt NVIDIA, we're seeking a highly skilled Senior Cloud Reliability Engineer to join our team. As a key member of our Site Reliability Engineering (SRE) team, you'll be responsible for designing, building, and maintaining large-scale production systems with high efficiency and availability.This is a highly specialized discipline that demands knowledge across...
-
Senior DevOps Engineer
4 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RoleWe are seeking a highly skilled Sr Staff Site Reliability Engineer to join our CDL/SLS team at Palo Alto Networks. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.As a Sr Staff Site Reliability Engineer, you will contribute to the success of our SRE...
-
Site Reliability Engineering Lead
1 week ago
Santa Clara, California, United States NVIDIA Full timeAs a Senior Manager in Site Reliability Engineering (SRE) at NVIDIA, you will lead a team dedicated to the design, construction, and maintenance of expansive production systems, emphasizing high efficiency and availability. This role spans various domains, including software and systems engineering, cloud-scale storage, data management, and services. SRE...
-
Senior Staff Site Reliability Engineer
4 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout UsPalo Alto Networks is a leader in the cybersecurity industry, dedicated to protecting the digital way of life. Our mission is to be the cybersecurity partner of choice, and we're looking for innovators who share our passion for shaping the future of cybersecurity.We're a company built on disruption, and we're looking for individuals who are...
-
Senior ASIC Physical Design Engineer
4 weeks ago
Santa Clara, California, United States Capgemini Engineering Full timeJob Title: Senior ASIC Physical Design EngineerJob Summary:We are seeking a highly skilled Senior ASIC Physical Design Engineer to join our team at Capgemini Engineering. As a key member of our design team, you will be responsible for designing and implementing complex ASICs using cutting-edge technologies and tools.Key Responsibilities:Design and implement...
-
Senior ASIC Physical Design Engineer
3 weeks ago
Santa Clara, California, United States Capgemini Engineering Full timeJob Title: Senior ASIC Physical Design EngineerJob Summary: We are seeking a highly skilled Senior ASIC Physical Design Engineer to join our team at Capgemini Engineering. As a key member of our design team, you will be responsible for the implementation of complex ASICs, focusing on high frequency block timing closure and physical verification. Key...
-
Senior Wireless Network Engineer
4 weeks ago
Santa Clara, California, United States NVIDIA Full timeJob DescriptionWe are seeking a highly skilled Senior Wireless Network Engineer to join our team at NVIDIA. As a key member of our Network Support and SRE team, you will play a critical role in ensuring the high availability and reliability of our wireless infrastructure.Your primary responsibilities will include owning the operational aspect of the wireless...
-
Senior Integration Developer
4 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RoleWe are seeking an experienced Senior Integration Developer to join our dynamic IT team at Palo Alto Networks. As a key member of our team, you will play a critical part in driving the transformation of our integration landscape, improving transaction speed, data accuracy, and ensuring a seamless user experience.You will be responsible for...
-
Senior Backend Software Engineer
4 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly skilled Senior Backend Software Engineer to join our team. As a key member of our engineering team, you will be responsible for designing and developing distributed backend services that serve as the backbone of our cloud-delivered security platform.Key ResponsibilitiesAnalyze requirements and design,...
-
Senior Geotechnical Engineer and Team Leader
3 weeks ago
Santa Clara, California, United States TRC Companies Full timeAbout UsAt TRC, we're a team of innovators, thinkers, and problem-solvers who are passionate about shaping a brighter, more sustainable future. Our commitment to safety, quality, integrity, creativity, accountability, teamwork, and passion drives everything we do.We're a leading provider of geo-environmental consulting services, and we're seeking a talented...
-
Senior Staff Site Reliability Engineer
4 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RoleWe are seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team at Palo Alto Networks. As a key member of our team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps,...
-
Senior Manager
4 weeks ago
Santa Clara, California, United States Nvidia Full timeJob SummaryNVIDIA is seeking a highly experienced Senior Manager to lead our Storage Systems team. As a key member of our Site Reliability Engineering (SRE) organization, you will be responsible for designing, implementing, and maintaining scalable and reliable storage systems to support our cloud infrastructure.Key ResponsibilitiesLead a team of Storage SRE...
-
Santa Clara, California, United States XPENG Motors Full timeJob Title: Senior Staff AI Infrastructure SREXpeng Motors is a leading smart electric vehicle company that designs, develops, and manufactures cutting-edge EVs with advanced Internet, AI, and autonomous driving technologies. We are committed to in-house R&D and intelligent manufacturing to create a better mobility experience for our customers.About the...
-
Senior Integration Developer
4 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeJob SummaryWe are seeking an experienced Senior Integration Developer to join our dynamic IT team at Palo Alto Networks. As a key member of our team, you will be responsible for designing, developing, and implementing scalable integration solutions using SnapLogic and other cutting-edge technologies.As a Senior Integration Developer, you will collaborate...
-
Senior Cloud Infrastructure Engineer
4 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeAbout the RolePalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our Cortex Data Lake team. As a key member of our team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key ResponsibilitiesContribute to the success of our SRE and DevOps teams by developing...
-
Senior Staff Site Reliability Engineer
4 weeks ago
Santa Clara, California, United States Palo Alto Networks Full timeJob DescriptionPalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team. As a key member of our infrastructure team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key Responsibilities:Develop expertise in new technologies and contribute to the...