Cloud Engineering Infrastructure Development Lead

6 days ago


Santa Clara, California, United States Oracle Full time
Job Description

Job Summary: We are seeking a highly skilled and experienced Senior Principal Software Engineer to join our Cloud Engineering Infrastructure Development team at Oracle. As a key member of our team, you will be responsible for designing, developing, and performance tuning the networking stack required to run distributed AI/ML/HPC workloads across thousands of GPUs.

Key Responsibilities:

  • Design and develop high-performance networking systems for AI/ML/HPC workloads
  • Collaborate with cross-functional teams to integrate networking solutions with distributed systems
  • Develop and maintain collective communications libraries and GPU frameworks
  • Troubleshoot and optimize network performance for large-scale distributed systems
  • Stay up-to-date with industry trends and emerging technologies in cloud networking and AI/ML/HPC

Requirements:

  • 10+ years of experience in software development with a focus on high-performance networking and distributed systems
  • 3+ years of experience with RDMA over Infiniband networks, including setup, troubleshooting, tuning, and scaling
  • 3+ years of experience with collective communications libraries and GPU frameworks
  • Proficient in data structures, algorithms, and operating systems
  • Excellent organizational, verbal, and written communication skills
  • Bachelor's degree in Computer Science or related engineering fields

Preferred Qualifications:

  • Masters or PhD degree in Computer Science or related engineering fields
  • Experience with distributed workload managers like Slurm or K8s
  • Experience with ML training frameworks like PyTorch or TensorFlow
  • Experience with Linux performance tools
  • Experience in SDN, NFV, cloud networking, and infrastructure-as-a-service

What We Offer:

  • Competitive salary range: $96,800 to $251,600 per annum
  • Eligibility for bonus, equity, and compensation deferral
  • Comprehensive benefits package, including medical, dental, and vision insurance, short-term and long-term disability, life insurance, and more

  • Software Engineer

    2 weeks ago


    Santa Clara, California, United States Oracle Full time

    Software Engineer - Cloud Engineering Infrastructure DevelopmentOracle is seeking a skilled Software Engineer to design, develop, and troubleshoot software programs for various purposes, including file storage, databases, applications, and tools networks.Key Responsibilities:Collaborate with cross-functional teams to define and develop software for tasks...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA is seeking talented engineers to enhance its AI Infrastructure. We are looking for individuals with a robust programming foundation, profound knowledge of distributed systems, and a strong grasp of software testing and deployment methodologies. Excellent communication and organizational skills are essential. We value innovative thinkers who can...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RolePalo Alto Networks is seeking a highly skilled Senior Principal Cloud Reliability Engineer to join our team. As a key member of our cloud infrastructure team, you will be responsible for designing, building, and operating reliable and secure cloud infrastructure.Key ResponsibilitiesContribute to the success of our cloud infrastructure team by...


  • Santa Clara, California, United States eTeam Full time

    Job DescriptionJob Title: Cloud Infrastructure ArchitectLocation: Remote (with occasional travel)Job Type: Full-timeAbout eTeam: eTeam is a leading provider of cloud-based solutions, dedicated to delivering innovative and secure infrastructure to our clients.Job Summary: We are seeking an experienced Cloud Infrastructure Architect to join our team. The ideal...


  • Santa Clara, California, United States NVIDIA Full time

    The NVIDIA GPU Cloud (NGC) team is seeking experienced software engineers to develop NVIDIA's advanced compute cloud solutions. These solutions encompass software for managing hardware and network provisioning to create a multi-tenant infrastructure. As a software engineer, you will collaborate with fellow engineers, product architects, and product managers...


  • Santa Ana, California, United States Rancho Santiago Community College District Full time

    Position Title: Cloud Infrastructure EngineerOverview: The Cloud Infrastructure Engineer is responsible for overseeing the district's cloud computing resources, focusing on Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solutions. This role requires a deep understanding of cloud architecture, security protocols, and operational...


  • Santa Clara, California, United States Geospatial And Cloud Analytics Inc Full time

    About the RoleWe are seeking a highly skilled Senior Cloud Reliability Engineer to join our team at Geospatial And Cloud Analytics Inc. As a key member of our engineering team, you will be responsible for designing, implementing, and supporting operational and reliability aspects of large-scale cloud infrastructure.Key ResponsibilitiesDesign and implement...


  • Santa Clara, California, United States Astera Labs Full time

    Astera Labs stands at the forefront of innovative connectivity solutions, enabling the full potential of AI and cloud infrastructure. Our Intelligent Connectivity Platform seamlessly integrates PCIe, CXL, and Ethernet semiconductor-based solutions alongside the COSMOS software suite, delivering a software-defined architecture that is both scalable and...


  • Santa Clara, California, United States NVIDIA Full time

    Job SummaryNVIDIA is seeking a highly skilled Senior Cloud Engineer to join its Infrastructure, Planning and Processes organization. As a Senior Cloud Engineer, you will be part of a fast-paced team that develops and maintains NVIDIA's internal cloud provisioning product for GPUs and Tegra systems.Key ResponsibilitiesDesign and implement scalable, resilient...


  • Santa Clara, California, United States TechStar Group Full time

    Job Title: Cloud Infrastructure Architect**Job Summary:**We are seeking a highly skilled Cloud Infrastructure Architect to join our team at TechStar Group. As a key member of our infrastructure team, you will be responsible for designing, implementing, and managing our cloud infrastructure to ensure high levels of performance, availability, and security.Key...


  • Santa Clara, California, United States NVIDIA Full time

    About the RoleNVIDIA is seeking a seasoned Cloud Engineer to join its fast-paced Infrastructure, Planning and Processes organization. As a Senior Cloud Engineer, you will be part of a dynamic team that develops and maintains NVIDIA's internal cloud provisioning product for GPUs and Tegra systems.Key ResponsibilitiesDesign and implement scalable, resilient...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly experienced Senior Manager, Software Engineering to lead our Cloud Security Engineering team at Palo Alto Networks. As a key member of our engineering team, you will be responsible for driving and delivering our next-generation virtualization products and cloud security offerings.Key ResponsibilitiesLead and expand a...


  • Santa Clara, California, United States ServiceNow Full time

    Job DescriptionOverviewThe ServiceNow SRE team is a group of highly technical engineers who are tasked with maintaining and developing the reliability, scalability, and performance of the ServiceNow cloud infrastructure.Our SREs are empowered to drive technical resolutions across the technology stack from hardware through to application and all stops in...


  • Santa Monica, California, United States GoodRx Full time

    GoodRx serves as a pivotal healthcare marketplace in the United States, assisting millions of individuals each month in locating trustworthy health information and securing discounts on their healthcare expenses. Since its inception, GoodRx has facilitated savings of $60 billion for consumers, providing access to prescription discounts accepted at over...


  • Santa Clara, California, United States Cryptoware Technologies Inc Full time

    Job DescriptionJob SummaryCryptoware Technologies Inc is seeking a highly skilled Global Infrastructure Expansion Lead to join our team. As a key member of our engineering team, you will be responsible for leading the effort of global expansion of our globe-spanning infrastructure.Key ResponsibilitiesLead the effort of global expansion of our globe-spanning...


  • Santa Clara, California, United States XPENG Motors Full time

    About XPeng MotorsXpeng Motors is a leading innovator in the electric vehicle industry, dedicated to designing, developing, and manufacturing cutting-edge smart electric vehicles that seamlessly integrate advanced Internet, AI, and autonomous driving technologies.Job SummaryWe are seeking a highly skilled Senior Staff AI Infrastructure Site Reliability...

  • Senior IT Engineer

    2 weeks ago


    Santa Clara, California, United States OmniVision Technologies Full time

    About OmniVision TechnologiesWe are a leading developer of advanced digital imaging solutions, providing a diverse culture that works together on the development of cutting-edge imaging technology, products, and solutions.Job SummaryWe are seeking a highly skilled Senior IT Engineer to lead our cloud infrastructure team. The successful candidate will be...


  • Santa Clara, California, United States NVIDIA Full time

    We are looking for a Lead Cloud Software Engineer to become a vital member of the DRIVE Sim Cloud team at NVIDIA. In this position, you will play a key role in shaping the future of autonomous vehicle technology. You will thrive in a fast-paced environment where creativity and challenging conventional methods are encouraged. Your proficiency in backend...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job OverviewCompany OverviewPalo Alto Networks is dedicated to safeguarding our digital existence. Our mission is to be the premier cybersecurity partner, ensuring a secure and safe environment for everyone.VisionWe envision a future where each day is more secure than the last. Our foundation is built on innovation and a commitment to redefining the...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Job OverviewYour Career JourneyUtilize your expertise in backend Java cloud engineering to contribute to cutting-edge cloud software and web applications. Join us in deploying and scaling the next generation of cloud security, leveraging big data and analytics.We are seeking a Principal Engineer to be part of the team dedicated to developing our latest cloud...