GPU Fleet Engineer – Hyperscale Infra, Kubernetes

3 weeks ago


San Francisco, United States OpenAI Full time

Join a forward-thinking company as an engineer in the fleet infrastructure team, where you'll design and operate systems for one of the largest GPU fleets globally. This role offers the chance to work in a hybrid setting while contributing to cutting-edge AI capabilities. Your expertise in hyperscale compute systems and programming will be crucial in shaping infrastructure that supports model deployment and training. Collaborate with diverse teams to ensure high reliability and utilization, all while being part of a mission-driven organization that values safety and human needs in AI development.
#J-18808-Ljbffr



  • San Francisco, United States Genmo Full time

    Site Reliability Engineer — GPU Infrastructure Join Genmo, a research lab dedicated to building open, state‑of‑the‑art models for video generation. We are looking for a Site Reliability Engineer to build and operate GPU infrastructure that powers our generative models. This is a contract‑to‑hire position. What You’ll Do Own design and...


  • San Francisco, United States Genmo Full time

    Site Reliability Engineer GPU Infrastructure Join Genmo, a research lab dedicated to building open, state?of?the?art models for video generation. We are looking for a Site Reliability Engineer to build and operate GPU infrastructure that powers our generative models. This is a contract?to?hire position. What Youll Do Own design and day?to?day operation of...


  • San Francisco, United States OpenAI Full time

    This role will support the fleet infrastructure team at OpenAI. The fleet team focuses on running the world’s largest, most reliable, and frictionless GPU fleet to support OpenAI’s general purpose model training and deployment. Work on this team ranges fromMaximizing GPUs doing useful work by building user-friendly scheduling and quota systemsRunning a...


  • San Francisco, United States OpenAI Full time

    This role will support the fleet infrastructure team at OpenAI. The fleet team focuses on running the world's largest, most reliable, and frictionless GPU fleet to support OpenAI's general purpose model training and deployment. Work on this team ranges fromMaximizing GPUs doing useful work by building user-friendly scheduling and quota systemsRunning a...


  • San Francisco, United States Genmo Full time

    We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI. Join us in shaping the future of AI and pushing the boundaries of what's possible in video generation.What You’ll DoOwn the design and day‑to‑day operation of GPU clusters that train and serve frontier...

  • Research Engineer

    4 weeks ago


    San Francisco, United States Storm3 Full time

    This range is provided by Storm3. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $200,000.00/yr - $350,000.00/yr Direct message the job poster from Storm3 ⚡ Research Engineer - ML Sys, Infra Optimization and Scaling Come join a revolutionary AI research lab in SF Bay Area that is...


  • San Francisco, CA, United States Openai Full time

    This role will support the fleet infrastructure team at OpenAI. The fleet team focuses on running the world's largest, most reliable, and frictionless GPU fleet to support OpenAI's general purpose model training and deployment. Work on this team ranges from Maximizing GPUs doing useful work by building user-friendly scheduling and quota systems Running a...


  • San Francisco, CA, United States Openai Full time

    This role will support the fleet infrastructure team at OpenAI. The fleet team focuses on running the world's largest, most reliable, and frictionless GPU fleet to support OpenAI's general purpose model training and deployment. Work on this team ranges from Maximizing GPUs doing useful work by building user-friendly scheduling and quota systems Running a...

  • AI Infra Engineer

    4 weeks ago


    San Francisco, United States Pantera Capital Full time

    Location San Francisco Employment Type Full time Location Type Hybrid Department AI We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes, Slurm, Python, C++, PyTorch, and primarily on AWS. As an AI Infrastructure Engineer, you will be partnering closely with our Inference and Research teams to build, deploy, and optimize...


  • San Francisco, California, United States Aimhire Full time

    Senior Infrastructure Engineer San Francisco, CA (On-site) | Full-time | Visa Sponsorship AvailableAbout the RoleOur client is looking for aSenior Infrastructure Engineerwith 6+ years of experience scaling large, reliable systems at startups that have grown to hyperscale. The ideal candidate is deeply technical, high-agency, and thrives in fast-moving...