Safety and Resiliency Solutions Architect

2 weeks ago


Santa Clara, California, United States NVIDIA Full time

NVIDIA is a dynamic organization that continuously adapts by pursuing impactful opportunities that only we can address. We attract top talent to achieve our ultimate goal: to create a workplace that allows us to excel in our craft. We are currently looking for a Safety and Resiliency Architect to contribute to the development of GPU (Graphics Processing Units) and Tegra SoC hardware and software resiliency features. In this position, you will play a crucial role in a team of innovators, challenging conventional methods and pushing the limits of technology. You will have the chance to influence the leading GPUs and SoCs that power diverse product lines, from consumer graphics to autonomous vehicles and the expanding domain of artificial intelligence.

Key Responsibilities:

  • Collaborate with hardware and software teams to design new resiliency and safety features and steer future advancements.
  • Enhance hardware and software functionalities to boost system reliability, performance, and security.
  • Model and evaluate RAS metrics such as Failures in Time and Availability; and Safety metrics like Diagnostic Coverage and PMHF.
  • Conduct simulations to assess Architectural Vulnerability Factor and Liveness of on-die memory.
  • Engage in testing both new and existing resiliency and safety hardware and software features.
  • Develop diagnostic software components for Resiliency and Safety to operate on NVIDIA GPUs.
  • Ensure product compliance with functional safety standards (ISO 26262 and ASPICE). This includes defining requirements, architecture, and design with comprehensive traceability, performing safety analyses - FMEA/DFA/FTA, and ensuring software compliance with MISRA and Cert-C standards.

Qualifications:

  • Master's or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or a closely related field, or equivalent experience.
  • Familiarity with computer system architecture, microprocessors, and microcontroller fundamentals (caches, buses, direct memory access, etc.).
  • Basic understanding of GPU/SoC architecture aspects - Clocks, Resets, Interrupts, Memory Controller, Multimedia accelerator pipelines.
  • Proficiency in C/C++.
  • Experience with scripting and automation using Python or similar languages.
  • Knowledge of the software development life cycle, from requirements gathering to testing and maintenance.
  • Strong debugging and analytical capabilities.
  • Self-motivated and results-oriented.

Preferred Qualifications:

  • Familiarity with general hardware concepts, Verilog RTL coding, and simulations/debugging.
  • Knowledge of GPU Architectures and Machine Learning/Deep Learning principles.
  • Experience with CUDA Programming.
  • Background in embedded software development.
  • Experience in resiliency and functional safety domains.

NVIDIA has been at the forefront of GPU innovation since 1999, transforming the PC gaming landscape, redefining modern computer graphics, and revolutionizing parallel computing. More recently, GPU deep learning has sparked the modern AI era, with GPUs serving as the brains of computers, robots, and autonomous vehicles capable of perceiving and understanding their environment. Today, we are increasingly recognized as "the AI computing company".

The base salary range is 120,000 USD - 230,000 USD, determined by your location, experience, and the compensation of employees in similar roles.

You will also be eligible for equity and benefits. NVIDIA is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. We value diversity in our current and future employees and do not discriminate based on race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.



  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is a dynamic organization that continually seeks meaningful opportunities to address global challenges that only we can tackle. We attract top talent to achieve our mission: to create an environment where we can excel in our respective fields. We are currently looking for a Resiliency and Safety Architect to contribute to the advancement of GPU...


  • Santa Clara, California, United States NVIDIA Full time

    About the RoleNVIDIA is a leading innovator in the field of artificial intelligence, computer graphics, and high-performance computing. We are seeking a highly skilled Resiliency and Safety Expert to join our team and contribute to the development of cutting-edge GPU and Tegra SoC hardware and software resiliency features.Key ResponsibilitiesCollaborate with...


  • Santa Clara, California, United States NVIDIA Full time

    About the RoleNVIDIA is seeking a highly skilled Senior Software Architect to lead the development of AI software resilience for our most powerful AI supercomputers.Key ResponsibilitiesDevelop and implement critical resilience features to support frontier model training at scale, ensuring robust and reliable AI systems.Serve as a trusted authority on AI...


  • Santa Clara, California, United States P17 Solutions Full time

    P17 Solutions is seeking a talented Solutions Architect with a strong background in Machine Learning (ML) and Deep Learning (DL) to lead innovative projects. In this role, you will engage with cutting-edge computing technologies, collaborating with top-tier clients to implement advanced AI solutions.Key ResponsibilitiesStay abreast of the latest advancements...


  • Santa Clara, California, United States P17 Solutions Full time

    OverviewP17 Solutions is seeking a highly skilled Solutions Architect with a focus on Machine Learning and Deep Learning technologies. This role involves deploying advanced ML and DL models both on-premises and in cloud environments. As part of our Solution Architecture team, you will collaborate with leading technology companies, utilizing cutting-edge...


  • Santa Clara, California, United States P17 Solutions Full time

    OverviewP17 Solutions is seeking a talented Solutions Architect with a strong background in Machine Learning (ML) and Deep Learning (DL) to support our innovative projects. This role involves working with cutting-edge computing technologies and collaborating with leading enterprises to drive advancements in AI.Key ResponsibilitiesStay updated on the latest...


  • Santa Clara, California, United States P17 Solutions Full time

    P17 Solutions is seeking a talented Solutions Architect with a strong background in Machine Learning (ML) and Deep Learning (DL) to enhance our technical capabilities. This role is pivotal in collaborating with leading technology firms to implement cutting-edge AI solutions both on-premises and in cloud environments.Key ResponsibilitiesStay abreast of...


  • Santa Clara, California, United States P17 Solutions Full time

    Position OverviewP17 Solutions is seeking a highly skilled Solutions Architect with a strong background in Machine Learning and Deep Learning. This role involves deploying advanced ML and DL models both on-premises and in cloud environments. As part of our dedicated architecture team, you will engage with cutting-edge computing technologies, driving...


  • Santa Clara, California, United States Omega Solutions Full time

    Objective:The primary aim is for Omega Solutions to establish a cohesive Hub Site, along with sub-sites and global navigation menus that ensure a seamless experience for staff, promoting easy navigation and a uniform appearance across the organization.Typical Deliverables:Omega Solutions is in search of a seasoned SharePoint consultant to architect and train...


  • Santa Clara, California, United States JCW Group Full time

    JCW Group is collaborating with a prominent Data Science firm that is addressing intricate challenges through the application of Artificial Intelligence and Machine Learning. Following a significant merger with a major technology enterprise, they are in search of an Azure Solutions Architect with a specialization in Machine Learning.The successful candidate...


  • Santa Clara, California, United States Experis Full time

    Position: Apache Flink Solutions ArchitectLocation: RemoteCompensation: $90 per hourJob Overview:We are seeking a talented and experienced Apache Flink Solutions Architect to join our team at Experis. This role involves designing and implementing robust solutions utilizing Apache Flink, a leading stream processing framework.Key Responsibilities: Lead the...


  • Santa Clara, California, United States JCW Group Full time

    JCW Group is collaborating with a leading Data Science firm that is addressing intricate challenges through the use of Artificial Intelligence and Machine Learning. Following a significant merger with a prominent technology organization, they are in search of an Azure Solutions Architect specializing in Machine Learning.The successful candidate will be...


  • Santa Clara, California, United States Amazon Full time

    Are you driven by the challenge of designing innovative cloud solutions for cutting-edge "Internet of Things" (IoT) clients? The Amazon Web Services (AWS) Solutions Architect team collaborates with customers to create and implement some of the most scalable, adaptable, and robust cloud architectures. AWS Solutions Architects work closely with AWS Sales and...


  • Santa Clara, California, United States Couchbase, Inc. Full time

    About the Role:Couchbase, Inc. is seeking a highly skilled Sr. Solutions Architect to join our team. As a key member of our engineering organization, you will be responsible for reviewing and analyzing the use of our software development kits (SDKs) to ensure that our customers are utilizing best practices in their applications.Key Responsibilities:Review...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is at the forefront of the AI revolution, and we are seeking a seasoned Cloud Solutions Architect to facilitate the integration of GPU technology and software for our clients. This role involves crafting and implementing Machine Learning (ML), Deep Learning (DL), and data analytics solutions across various Cloud Computing Platforms. As a vital member...


  • Santa Clara, California, United States NVIDIA Corporation Full time

    Position Overview:The role of the Solutions Architect for DGX Cloud is pivotal in advancing the integration of Artificial Intelligence (AI) technologies into business operations. This position is essential for driving the successful implementation of NVIDIA's AI Enterprise Software and DGX Cloud solutions.Key Responsibilities:Act as a trusted technical...


  • Santa Clara, California, United States NVIDIA Corporation Full time

    Position Overview:The role of a Solutions Architect for DGX Cloud involves collaborating with clients to facilitate the integration of cutting-edge Artificial Intelligence (AI) technologies into their operations. The primary objective is to support the successful implementation and scaling of NVIDIA's DGX Cloud and AI Enterprise Software.Key...


  • Santa Clara, California, United States NVIDIA Corporation Full time

    Position Overview:The role of a Solutions Architect, Hyperscale is pivotal in delivering advanced Artificial Intelligence (AI) solutions to some of the largest clients in the industry. As a member of the NVIDIA Solutions Architecture team, you will engage with key customers to develop and implement AI/ML and High-Performance Computing (HPC) software...


  • Santa Clara, California, United States NVIDIA Corporation Full time

    Position Overview:The role of the Solutions Architect for DGX Cloud is pivotal in driving the integration of cutting-edge Artificial Intelligence (AI) technologies into business operations. This position is designed for individuals who are passionate about AI and are eager to facilitate the successful implementation of NVIDIA's DGX Cloud platform.Key...


  • Santa Clara, California, United States NVIDIA Full time

    Position Overview:We are seeking a Lead Firmware Solutions Architect specializing in Server Manageability. NVIDIA has been at the forefront of technological innovation since the introduction of the GPU in 1999, which has transformed the landscape of PC gaming, modern graphics, and parallel computing. As we advance into the era of AI computing, we are looking...