GPU Fleet Engineer – Hyperscale Infra, Kubernetes

3 weeks ago

San Francisco, United States OpenAI Full time

Join a forward-thinking company as an engineer in the fleet infrastructure team, where you'll design and operate systems for one of the largest GPU fleets globally. This role offers the chance to work in a hybrid setting while contributing to cutting-edge AI capabilities. Your expertise in hyperscale compute systems and programming will be crucial in shaping infrastructure that supports model deployment and training. Collaborate with diverse teams to ensure high reliability and utilization, all while being part of a mission-driven organization that values safety and human needs in AI development.
#J-18808-Ljbffr

Site Reliability Engineer — GPU Infrastructure

4 days ago

San Francisco, United States Genmo Full time

Site Reliability Engineer — GPU Infrastructure Join Genmo, a research lab dedicated to building open, state‑of‑the‑art models for video generation. We are looking for a Site Reliability Engineer to build and operate GPU infrastructure that powers our generative models. This is a contract‑to‑hire position. What You’ll Do Own design and...
Site Reliability Engineer GPU Infrastructure

4 weeks ago

San Francisco, United States Genmo Full time

Site Reliability Engineer GPU Infrastructure Join Genmo, a research lab dedicated to building open, state?of?the?art models for video generation. We are looking for a Site Reliability Engineer to build and operate GPU infrastructure that powers our generative models. This is a contract?to?hire position. What Youll Do Own design and day?to?day operation of...
Software Engineer, GPU Infrastructure

1 day ago

San Francisco, United States OpenAI Full time

This role will support the fleet infrastructure team at OpenAI. The fleet team focuses on running the world’s largest, most reliable, and frictionless GPU fleet to support OpenAI’s general purpose model training and deployment. Work on this team ranges fromMaximizing GPUs doing useful work by building user-friendly scheduling and quota systemsRunning a...
Software Engineer, GPU Infrastructure

3 weeks ago

San Francisco, United States OpenAI Full time

This role will support the fleet infrastructure team at OpenAI. The fleet team focuses on running the world's largest, most reliable, and frictionless GPU fleet to support OpenAI's general purpose model training and deployment. Work on this team ranges fromMaximizing GPUs doing useful work by building user-friendly scheduling and quota systemsRunning a...
Senior Site Reliability Engineer GPU Infrastructure

4 weeks ago

San Francisco, United States Genmo Full time

We are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI. Join us in shaping the future of AI and pushing the boundaries of what's possible in video generation.What You’ll DoOwn the design and day‑to‑day operation of GPU clusters that train and serve frontier...
Research Engineer

4 weeks ago

San Francisco, United States Storm3 Full time

This range is provided by Storm3. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range $200,000.00/yr - $350,000.00/yr Direct message the job poster from Storm3 ⚡ Research Engineer - ML Sys, Infra Optimization and Scaling Come join a revolutionary AI research lab in SF Bay Area that is...
Software Engineer, GPU Infrastructure

2 weeks ago

San Francisco, CA, United States Openai Full time

This role will support the fleet infrastructure team at OpenAI. The fleet team focuses on running the world's largest, most reliable, and frictionless GPU fleet to support OpenAI's general purpose model training and deployment. Work on this team ranges from Maximizing GPUs doing useful work by building user-friendly scheduling and quota systems Running a...
Software Engineer, GPU Infrastructure

2 weeks ago

San Francisco, CA, United States Openai Full time

This role will support the fleet infrastructure team at OpenAI. The fleet team focuses on running the world's largest, most reliable, and frictionless GPU fleet to support OpenAI's general purpose model training and deployment. Work on this team ranges from Maximizing GPUs doing useful work by building user-friendly scheduling and quota systems Running a...
AI Infra Engineer

4 weeks ago

San Francisco, United States Pantera Capital Full time

Location San Francisco Employment Type Full time Location Type Hybrid Department AI We are looking for an AI Infra engineer to join our growing team. We work with Kubernetes, Slurm, Python, C++, PyTorch, and primarily on AWS. As an AI Infrastructure Engineer, you will be partnering closely with our Inference and Research teams to build, deploy, and optimize...
Senior Infrastructure Engineer

3 days ago

San Francisco, California, United States Aimhire Full time

Senior Infrastructure Engineer San Francisco, CA (On-site) | Full-time | Visa Sponsorship AvailableAbout the RoleOur client is looking for aSenior Infrastructure Engineerwith 6+ years of experience scaling large, reliable systems at startups that have grown to hyperscale. The ideal candidate is deeply technical, high-agency, and thrives in fast-moving...

Americas

Europe

Asia / Oceania

Africa

GPU Fleet Engineer – Hyperscale Infra, Kubernetes