Senior Cloud Performance Architect

4 weeks ago


Santa Clara, California, United States NVIDIA Full time

NVIDIA is seeking a highly skilled Cloud AI Infrastructure Engineer to drive the performance analysis, optimization, and modeling of NVIDIA DGXTM Cloud clusters.

The ideal candidate will have a deep understanding of the methodology to conduct end-to-end performance analysis of critical AI applications running on large-scale parallel and distributed systems.

Candidates will work closely with multi-functional teams to define DGX Cloud cluster architecture for different cloud service providers, optimize workloads running on these systems, and develop the methodology that will drive the HW-SW codesign cycle to develop elite AI infrastructure at scale and make it more easily consumable by users.

Key Responsibilities:

  • Develop benchmarks, end-to-end customer applications running at scale, instrumented for performance measurements, tracking, and sampling to measure and optimize performance of meaningful applications and services.
  • Construct carefully designed experiments to analyze, study, and develop critical insights into performance bottlenecks and dependencies from an end-to-end perspective.
  • Develop ideas on how to improve the end-to-end system performance and usability by leading changes in the HW or SW (or both).
  • Collaborate with external cloud service providers during the full life cycle of cluster deployment and workload optimization to understand and drive standard methodologies.
  • Collaborate with AI researchers, developers, and application service providers to understand difficulties, requirements, project future needs, and share best practices.
  • Work with a diverse set of large language model workloads and their application areas, such as healthcare, climate modeling, pharmaceuticals, financial futures, genomics, and drug discovery.
  • Develop the vital modeling framework and the total cost of ownership analysis to enable efficient exploration and sweep of the architecture and design space.
  • Develop the methodology needed to drive the engineering analysis to advise the architecture, design, and roadmap of DGX Cloud.

Requirements:

  • 7+ years of proven experience.
  • Ability to work with large-scale parallel and distributed accelerator-based systems.
  • Expertise in optimizing performance and AI workloads on large-scale systems.
  • Experience with performance modeling and benchmarking at scale.
  • Strong background in computer architecture, networking, storage systems, and accelerators.
  • Familiarity with popular AI frameworks, such as PyTorch, TensorFlow, JAX, Megatron-LM, and Tensort-LLM.
  • Experience with AI/ML models and workloads, particularly large language models.
  • Understanding of deep neural networks and their use in emerging AI/ML applications and services.
  • Bachelor's or Master's in Engineering (preferably Electrical Engineering, Computer Engineering, or Computer Science) or equivalent experience.
  • Proficiency in Python, C/C++.
  • Expertise with at least one public cloud infrastructure (GCP, AWS, Azure, OCI, etc.).

Preferred Qualifications:

  • Very high intellectual curiosity.
  • Confidence to dig in as needed.
  • Not afraid of confronting complexity.
  • Able to pick up new areas quickly.
  • Proficiency in CUDA and XLA.
  • Excellent interpersonal skills.
  • PhD (nice to have).

NVIDIA offers competitive salaries and a generous benefits package. We are widely considered to be one of the technology world's most desirable employers. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you.



  • Santa Clara, California, United States Cynet Systems Full time

    Job Title: Senior Cloud Architect LeaderCynet Systems is seeking a highly experienced Senior Cloud Architect Leader to lead our cloud solutions team. The ideal candidate will have a strong background in public cloud environments, software engineering, and team leadership.Key Responsibilities:Lead the development and implementation of cloud-based...


  • Santa Clara, California, United States Nvidia Full time

    Job SummaryNVIDIA is seeking a highly skilled Cloud AI Performance Architect to drive the performance analysis, optimization, and modeling of our AI infrastructure. As a key member of our team, you will work closely with cross-functional teams to define the architecture and design of our cloud-based AI systems.Key ResponsibilitiesDevelop benchmarks and...


  • Santa Clara, California, United States AmazonWebServices Full time

    About the RoleWe are seeking a highly skilled Senior Cloud Solutions Architect to join our team at Amazon Web Services (AWS). As a key member of our Solutions Architect team, you will be responsible for designing and building scalable, flexible, and resilient cloud architectures and solutions for our customers.Key Responsibilities:Design and develop cloud...


  • Santa Clara, California, United States Amazon Web Services, Inc. Full time

    Senior GTM Solutions ArchitectAre you passionate about helping customers achieve their potential and have the skills to position AWS as the top choice for cloud services? We are seeking a Senior GTM Solutions Architect to join our Worldwide Specialist Organization (WWSO) Core Services team.As a Senior GTM Solutions Architect, you will work with a diverse...


  • Santa Clara, California, United States Couchbase Full time

    Couchbase is seeking a Cloud Security Architect to join our Capella engineering team. This role will be responsible for driving the architecture and leading implementation of systems pertaining to authentication, role and attribute based access control, data encryption, and network security.The ideal candidate will have a minimum of 10 years of experience in...


  • Santa Clara, California, United States AmazonWebServices Full time

    Cloud Solutions Architect - Hybrid EdgeWe are seeking a highly skilled Cloud Solutions Architect to join our team in North America. As a Cloud Solutions Architect, you will be responsible for leading the technical component of our Go-To-Market (GTM) strategy for AWS' Hybrid Edge for Independent Software Vendors (ISVs).Hybrid Edge today consists of AWS...


  • Santa Clara, California, United States Glow Networks Full time

    Cloud Networking ArchitectAt Glow Networks, we are seeking a highly skilled Cloud Networking Architect to join our team. As a key member of our engineering team, you will be responsible for designing and implementing scalable cloud-based networking solutions. Key Responsibilities:Design and implement cloud-based networking architecturesDevelop and maintain...


  • Santa Clara, California, United States Amazon Full time

    About the RoleWe are seeking a highly skilled Cloud Solutions Architect to join our team at Amazon. As a Cloud Solutions Architect, you will be responsible for designing and implementing cloud-based solutions for our customers.Key Responsibilities:Collaborate with customer executives and architects to accelerate their business outcomes and recommend cloud...


  • Santa Clara, California, United States Couchbase Full time

    Job SummaryWe are seeking a highly skilled Cloud Security Architect to join our team at Couchbase. As a Cloud Security Architect, you will be responsible for designing and implementing secure cloud-based systems and architectures that meet the needs of our customers.The ideal candidate will have a strong background in cloud security, with experience in...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionPalo Alto Networks is a leading cybersecurity company that is revolutionizing the way we protect our digital lives. We are seeking a highly skilled Cloud Security Solutions Engineer to join our team.The ideal candidate will have a strong background in cloud security, with a minimum of 5 years of experience securing workloads in Azure, Google...


  • Santa Clara, California, United States Glow Networks Full time

    Cloud Networking ArchitectWe are seeking a highly skilled Cloud Networking Architect to join our team at Glow Networks. As a key member of our engineering team, you will be responsible for designing and implementing scalable cloud-based networking solutions.Key Responsibilities:Design and implement cloud-based networking architecturesDevelop and maintain...


  • Santa Clara, California, United States Amazon Full time

    About the RoleWe are seeking a highly skilled Cloud Solutions Architect to join our team. As a Cloud Solutions Architect, you will be responsible for designing and implementing scalable, secure, and efficient cloud-based solutions for our customers.As a member of our team, you will work closely with cross-functional teams to develop and implement cloud-based...


  • Santa Clara, California, United States GyanSys Full time

    Job Title: Microsoft Cloud Solutions ArchitectAbout the Role:GyanSys is seeking a skilled Microsoft Cloud Solutions Architect to join our team. As a key member of our cloud solutions team, you will be responsible for designing and implementing scalable, secure, and resilient cloud architectures using Microsoft Azure and other cloud services.Key...


  • Santa Clara, California, United States Omni Inclusive Full time

    Cloud Data Architect Role at Omni InclusiveWe are seeking a highly skilled Cloud Data Architect to join our team. As a key member of our data architecture team, you will be responsible for designing and implementing scalable and secure Azure cloud solutions using Data Bricks Client platform. Your expertise in model development, deployment, and job scheduling...


  • Santa Clara, California, United States NVIDIA Full time

    Are you a technical expert looking to shape the future of AI? We're seeking a skilled Solutions Architect to join our NVIDIA AI Enterprise (NVAIE) Segment Team.The mission of the NVAIE Segment team is to guide and enable the successful adoption at scale of DGX Cloud and NVIDIA AI Enterprise Software in production.DGX Cloud is an AI platform for enterprise...


  • Santa Clara, California, United States PDDN INC. Full time

    Job Title: Senior Network Cloud DevelopmentJob Summary:We are seeking a highly skilled Senior Network Cloud Development professional to join our team at PDDN INC. The ideal candidate will have a strong background in software development, networking, and cloud computing, with a focus on designing and implementing scalable cloud-based solutions.Key...


  • Santa Clara, California, United States Cognizant North America Full time

    Cognizant North America is seeking a highly skilled Oracle Cloud Financial Architect to lead the design and implementation of financial solutions that align with business objectives and drive operational efficiency.The ideal candidate will have extensive experience in Oracle Cloud FIN - Collections, Oracle Cloud FIN - Receivables, and Oracle Financials...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the Role:Palo Alto Networks is the fastest-growing security company in history. We offer the chance to be part of an important mission: ending breaches and protecting our way of digital life. If you are a motivated, intelligent, creative, and hardworking individual, then this job is for you.The engineer will function as the Palo Alto Networks Cloud...


  • Santa Clara, California, United States Oracle Full time

    Job Title: Senior Software ArchitectWe are seeking a highly skilled Senior Software Architect to join our team at Oracle. As a key member of our engineering team, you will be responsible for designing and developing scalable, secure, and high-performance software systems.Key Responsibilities:Design and develop software systems that meet the needs of our...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job DescriptionAt Palo Alto Networks, we're looking for a talented Sr Principal Software Engineer to join our team. As a key member of our engineering team, you will be responsible for designing and developing distributed backend services that serve as the backbone of our cloud-delivered security platform, Prisma Access.About the RoleAs a Senior Engineer,...