Principal Infrastructure Sre

3 weeks ago


Santa Clara, United States NVIDIA Full time

NVIDIA has been reinventing computer graphics, PC gaming, and accelerated computing for 30 years. It is a unique legacy of innovation that’s fueled by great technology and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, generative AI, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.
- We are seeking a highly skilled Senior Staff Infrastructure Performance Engineer to join our dynamic team. Our company is at the forefront of technological innovation, and we are dedicated to driving efficiency and optimizing the performance of our infrastructure both on-prem and cloud. Join us in this exciting endeavor- What you will be doing:- Lead initiatives to transform IT Compute platform architecture to build new service offerings across On-Prem & Cloud.
- Define and implement metrics to measure the efficiency of compute platforms & services and drive efficiency.
- Collect and review system data for capacity and planning purposes, analyze capacity data and develop plans for appropriate level enterprise-wide systems, and coordinate with management personnel in implementing changes.
- Develop and maintain tools for collecting, analyzing, and visualizing data for reporting, alerting, monitoring.
- Collaborate with NVIDIA leadership, senior engineers, program managers, and product managers to develop compelling IT products and services that meet customer needs.- What we need to see:- Bachelor’s degree in Engineering, Computer Science, Mathematics, or related field, or equivalent experience
- 15+ years of proven experience in compute platform engineering with a focus on automation.
- Experience with design and deployment of virtualization architectures, including VMware, Openshift or KubeVirt platforms.
- Strong analytical skills with the ability to define and track key performance metrics.
- Experience in developing tools for data analysis and performance profiling, Development with Terraform, Config Management tools.
- Proficiency in programming languages such as Go and/or Python.
- Experience with running large environments consisting of BareMetal, large scale virtualized environment with a mix of tens of thousands of VM’s and cloud infrastructure.- Ways to stand out from the crowd:- Deep understanding of other infrastructure components like Storage, DNS, AD, Security Tools etc..
- Hands-on experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
- Solid understanding of microservices architecture, infrastructure as code (IaC) and configuration management tools.
- Understanding of AI ops and how to leverage LLMs to automate various optimization initiatives- NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from youThe base salary range is 196,000 USD - 310,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
You will also be eligible for equity and
benefits

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.



  • Santa Clara, CA, United States Astera Labs Full time

    Astera Labs is a global leader in purpose-built connectivity solutions that unlock the full potential of cloud and AI infrastructure. Our Intelligent Connectivity Platform integrates PCIe, CXL and Ethernet semiconductor-based solutions based on a software-defined architecture that is both scalable and customizable. Inspired by trusted partnerships with...

  • DevOps SRE

    4 weeks ago


    Santa Clara, United States Palo Alto Networks Full time

    Palo Alto Networksis the fastest-growing security company in history. We offer the chance to be part of an important mission: ending breaches and protecting our way of digital life. If you are a motivated, intelligent, creative, and hardworking individual, then this job is for you!NOT YOUR PARENTS IT!The traditional IT organization is a thing of the past. As...


  • Santa Clara, United States NVIDIA Full time

    NVIDIA has been reinventing computer graphics, PC gaming, and accelerated computing for 30 years. It is a unique legacy of innovation that’s fueled by great technology and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, generative AI,...

  • SRE Engineer

    1 week ago


    Santa Clara, United States Omega Solutions Full time

    SRE Engineer - 01 Positions St Louis, MO (Onsite from day 1) Client Required Skills: •Bachelor's Degree in Computer Science, Computer Systems, Information Technology or related. Equivalent experience is acceptable. •Experience with web applications and distributed systems infrastructure. •Excellent verbal and written communication to a variety of...

  • SRE Engineer

    1 month ago


    Santa Clara, United States Omega Solutions Full time

    SRE Engineer - 01 Positions St Louis, MO (Onsite from day 1) Client Required Skills: •Bachelor's Degree in Computer Science, Computer Systems, Information Technology or related. Equivalent experience is acceptable. •Experience with web applications and distributed systems infrastructure. •Excellent verbal and written communication to a variety of...

  • SRE Engineer

    2 days ago


    Santa Clara, United States Omega Solutions Full time

    SRE Engineer - 01 Positions St Louis, MO (Onsite from day 1) Client Required Skills: •Bachelor's Degree in Computer Science, Computer Systems, Information Technology or related. Equivalent experience is acceptable. •Experience with web applications and distributed systems infrastructure. •Excellent verbal and written communication to a variety of...


  • Santa Clara, United States Kofi Group Full time

    To Apply for this Job Click HerePrincipal Site Reliability EngineerSan Francisco Bay Area, CAWe are partnering with a late-stage Cloud Security company that is looking for a Principal Level SRE The ideal candidate will have:Strong sense of architecture and design for fault tolerance, scale-out approaches, and stability Deep experience in building tools...


  • Santa Clara, United States Palo Alto Networks Full time

    Job Description Your Career We are seeking an automation savvy Senior Principal QA Engineer as we scale the Prisma Access Test team. We are looking for a strong technical leader who takes ownership of their areas of focus and who are driven to solve problems at every level. Collaboration and teamwork are at the foundation of our culture and we need...


  • Santa Clara, United States NVIDIA Full time

    Senior Production SRE Engineer - Storage page is loaded Senior Production SRE Engineer - Storage Apply locations US, CA, Santa Clara US, Remote time type Full time posted on Posted 4 Days Ago job requisition id JR1980966 Site Reliability Engineering (SRE) is an engineering discipline that involves designing, building, and maintaining large-scale production...


  • Santa Clara, United States NVIDIA Full time

    Principal Infrastructure Performance and Development Engineer page is loaded Principal Infrastructure Performance and Development Engineer Apply locations US, CA, Santa Clara time type Full time posted on Posted Yesterday job requisition id JR1981842 Joining NVIDIA's AI Efficiency Team means contributing to the infrastructure that powers our leading-edge AI...


  • Santa Clara, United States Palo Alto Networks Full time

    Sr Principal Site Reliability Engineer (Advanced Threat Protection) Palo Alto Networks Implement Zero Trust, Secure your Network, Cloud workloads, Hybrid Workforce, Leverage Threat Intelligence & Security Consulting. Cybersecurity Services & Education for CISO’s, Head of Infrastructure, Network Security Engineers, Cloud... View company page At Palo Alto...

  • Senior Manager

    1 month ago


    Santa Clara, United States NVIDIA Full time

    As a Sr Manager in Site Reliability Engineering (SRE), you will lead a team dedicated to the design, construction, and maintenance of expansive production systems, emphasizing high efficiency and availability. This role spans various domains, including software and systems engineering, cloud-scale storage, data management, and services. SRE Senior Managers...

  • Senior Manager

    2 days ago


    Santa Clara, United States NVIDIA Full time

    As a Sr Manager in Site Reliability Engineering (SRE), you will lead a team dedicated to the design, construction, and maintenance of expansive production systems, emphasizing high efficiency and availability. This role spans various domains, including software and systems engineering, cloud-scale storage, data management, and services. SRE Senior Managers...


  • Santa Clara, United States Palo Alto Networks Full time

    Job Description Your Career We are looking for a Principal DevOps/SRE to operate in production a large scale GCP cloud running our innovative SaaS cyber-security product, while continuously improving application deployment, monitoring, operability and uptime of the service. The Cortex XDR group specializes in analysis and visualization of complex...


  • Santa Clara, United States Palo Alto Networks Full time

    Your Career Palo Alto Networks SaaS Security team is looking for a seasoned and accomplished Senior Principal Software Engineer to help scale out our security platform with a sharp focus on platform and infrastructure capabilities. As a member of the team, you have the unique opportunity to: Be part of a world-class software engineering team that works on...

  • Director, IT

    1 week ago


    Santa Clara, United States NVIDIA Full time

    Director, IT - Compute Services page is loaded Director, IT - Compute Services Apply locations US, CA, Santa Clara time type Full time posted on Posted 12 Days Ago job requisition id JR1979519 NVIDIA is seeking a Director, IT - Compute Services within the IT Infrastructure organization. In this role, you will build and own the Compute platform Strategy for...


  • Santa Clara, CA, United States NVIDIA Corporation Full time

    Principal Infrastructure Performance and Development Engineer page is loaded Principal Infrastructure Performance and Development Engineer Apply locations US, CA, Santa Clara time type Full time posted on Posted Yesterday job requisition id JR1981842 Joining NVIDIA's AI Efficiency Team means contributing to the infrastructure that powers our leading-edge...


  • Santa Clara, United States Palo Alto Networks Full time

    Our MissionAt Palo Alto Networks® everything starts and ends with our mission:Being the cybersecurity partner of choice, protecting our digital way of life.Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for...


  • Santa Clara, United States Palo Alto Networks Full time

    Company Description Our Mission At Palo Alto Networks® everything starts and ends with our mission: Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are...


  • Santa Clara, United States Palo Alto Networks Full time

    Our MissionAt Palo Alto Networks® everything starts and ends with our mission:Being the cybersecurity partner of choice, protecting our digital way of life.Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for...