Senior Cloud Reliability Engineer

6 days ago


Santa Clara, California, United States NVIDIA Full time
Job Description

NVIDIA is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you will be responsible for designing, implementing, and supporting operational and reliability aspects of large-scale Observability & Telemetry collection platforms.

Key Responsibilities:

  • Design and implement operational and reliability aspects of large-scale Observability & Telemetry collection platforms.
  • Engage in and improve the whole lifecycle of services, from inception and design through deployment, operation, and refinement.
  • Support services before they go live through activities such as system design consulting, developing software tools, platforms, and frameworks, capacity management, and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.
  • Be part of an on-call rotation to support production systems.

Requirements:

  • BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience.
  • 5+ years of experience with infrastructure automation, distributed systems design, experience with design, develop tools for running large-scale private or public cloud systems in production.
  • 5+ years of experience delivering foundational infrastructure and observability platforms.
  • Experience in one or more of the following: Python, Go, Perl, or Ruby.
  • In-depth knowledge of Linux, Networking, and Containers.

What We Offer:

  • A competitive base salary range of $148,000 - $276,000 USD.
  • Eligibility for equity and benefits.
  • A diverse and inclusive work environment.

NVIDIA is an equal opportunity employer and welcomes applications from diverse candidates.



  • Santa Clara, California, United States Geospatial And Cloud Analytics Inc Full time

    About the RoleWe are seeking a highly skilled Senior Cloud Reliability Engineer to join our team at Geospatial And Cloud Analytics Inc. As a key member of our engineering team, you will be responsible for designing, implementing, and supporting operational and reliability aspects of large-scale cloud infrastructure.Key ResponsibilitiesDesign and implement...


  • Santa Clara, California, United States NVIDIA Full time

    Job DescriptionNVIDIA is seeking a highly skilled Senior Cloud Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, implementing, and supporting operational and reliability aspects of large scale Kubernetes clusters.Key ResponsibilitiesDesign and implement operational and reliability aspects...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleThe Global Customer Operation Team at Palo Alto Networks is responsible for building products that safeguard data, workloads, and infrastructure for some of the world's largest enterprise customers. Our team helps customers navigate their journey to the public cloud by ensuring reliability, liability, and software architect expertise.Key...


  • Santa Clara, California, United States Centrify Corporation Full time

    Cloud Site Reliability EngineerAt Centrify Corporation, we're seeking a skilled Cloud Site Reliability Engineer to join our Cloud DevOps team. As a key member of our operations team, you'll play a critical role in ensuring the uptime and delivery of our cloud-based services.Key Responsibilities:Manage our cloud application using DevOps and Agile practices to...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a key member of our infrastructure platform, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key ResponsibilitiesContribute to the success of SRE and DevOps teams by developing expertise...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job Title: Senior Cloud Infrastructure EngineerPalo Alto Networks is seeking a highly skilled Senior Cloud Infrastructure Engineer to join our team. As a key member of our Cloud Infrastructure team, you will be responsible for designing, building, and operating scalable and secure cloud infrastructure.About the RoleWe are looking for a talented engineer with...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team at Palo Alto Networks. As a key member of our team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.Key ResponsibilitiesContribute to the success of SRE and DevOps teamsDevelop expertise in new...


  • Santa Clara, California, United States Veear Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Veear. As a key member of our infrastructure team, you will play a critical role in ensuring the reliability, scalability, and security of our cloud-based systems.Key ResponsibilitiesCollaboration and PartnershipPartner with cross-functional teams to ensure security...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our team. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure.Key ResponsibilitiesDevelop expertise in new technologies and contribute to the success of SRE and...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is seeking a highly skilled Senior Staff Site Reliability Engineer to join our CDL/SLS team. As a key member of our engineering team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key Responsibilities:Contribute to the success of SRE and DevOps teamsDevelop expertise...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job Title: Senior Cloud Security EngineerWe are seeking a highly skilled Senior Cloud Security Engineer to join our team at Palo Alto Networks. As a Senior Cloud Security Engineer, you will be responsible for designing and developing secure cloud-based solutions for our customers.Key Responsibilities:Design and develop secure cloud-based solutions for our...


  • Santa Clara, California, United States Centrify Corporation Full time

    Cloud Site Reliability EngineerAt Centrify Corporation, we're committed to delivering high-quality, mission-critical cloud-based services to our customers. As a Cloud Site Reliability Engineer, you'll play a critical role in ensuring the uptime and reliability of our cloud applications.Key Responsibilities:Manage our cloud application using DevOps and Agile...


  • Santa Clara, California, United States Cloud Integrator Inc Full time

    Senior ServiceNow Engineer (UI/Backend) OpportunityCloud Integrator Inc is seeking a highly skilled Senior ServiceNow Engineer to join our team in Santa Clara, CA. As a key member of our engineering team, you will be responsible for designing and implementing ServiceNow solutions, focusing on UI/portal development and back-end configuration.Key...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Cloud Infrastructure EngineerNVIDIA is seeking a highly skilled Senior Cloud Infrastructure Engineer to join our Infrastructure, Planning and Process (IPP) team. As a key member of our global organization, you will be responsible for designing, building, and maintaining our cloud infrastructure to support the development and deployment of...

  • Senior Cloud Engineer

    3 weeks ago


    Santa Clara, California, United States NVIDIA Full time

    About the RoleNVIDIA is seeking a seasoned Cloud Engineer to join its fast-paced Infrastructure, Planning and Processes organization. As a Senior Cloud Engineer, you will be part of a dynamic team that develops and maintains NVIDIA's internal cloud provisioning product for GPUs and Tegra systems.Key ResponsibilitiesDesign and implement scalable, resilient...


  • Santa Clara, California, United States ServiceNow Full time

    OverviewThe ServiceNow SRE team is a group of highly technical engineers who are tasked with maintaining and developing the reliability, scalability, and performance of the ServiceNow cloud infrastructure.Our SREs are empowered to drive technical resolutions across the technology stack from hardware through to application and all stops in between.They are...


  • Santa Clara, California, United States NVIDIA Full time

    Join NVIDIA's AI Efficiency TeamWe are seeking a Senior Site Reliability Engineer to contribute to the infrastructure that powers our innovative AI research.About the RoleThis team focuses on optimizing efficiency and resiliency of AI workloads, as well as developing scalable AI and Data infrastructure tools and services.Our objective is to deliver a stable,...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Senior Staff DevOps Engineer to join our Cloud Infrastructure team. As a key member of our team, you will be responsible for designing, building, and operating reliable, secure cloud infrastructure to support our mission-critical applications.Key ResponsibilitiesDesign and implement scalable,...


  • Santa Clara, California, United States NVIDIA Full time

    Job Title: Senior Cloud Infrastructure EngineerNVIDIA is seeking a highly skilled Senior Cloud Infrastructure Engineer to join our Infrastructure, Planning and Process (IPP) team. As a key member of our global organization, you will be responsible for designing, building, and maintaining our cloud infrastructure to support the development and deployment of...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job Title: Senior Software Engineer - Cloud SecurityWe are seeking an experienced Senior Software Engineer to join our Prisma Access Edge Platform team at Palo Alto Networks. As a key member of our team, you will design, develop, and implement highly scalable and reliable software features using custom and open-source software.Key Responsibilities:Design and...