Infrastructure Engineer
2 weeks ago
At Replicate, we believe AI shouldn’t be exclusive to tech giants — it should be accessible to every software developer. Our goal is straightforward: build the best platform for creating, deploying, and running machine learning models. As an Infrastructure Engineer on the Platform team, you’ll play a key role in making generative AI available to everyone.
The Platform team at Replicate oversees the entire lifecycle of models, from packaging and deployment to serving, scaling, and monitoring. You’ll be developing the infrastructure that supports thousands of models and powers millions of predictions daily. This is a chance to build something truly innovative, where each decision you make has a tangible impact and allows your creativity to shine.
What you’ll be doing:- Designing and building our deployment and model-serving platform.
- Building technology to operate the latest advancements in the ML and AI space.
- Designing systems to maximize the utilization and reliability of our Kubernetes clusters and GPUs, including multi-regional traffic shifting and failover capabilities.
- Owning and optimizing fair and reliable task allocation and queuing across a diverse set of customers with heterogeneous workloads.
- Working with our Models team to speed up model inference through techniques like caching, weights management, machine configurations, and runtime optimizations in Python and PyTorch.
Working with technologies such as:
- Python, Go, and Node.js
- Kubernetes and Terraform
- Redis, Google BigQuery, and PostgreSQL
- Experience building platforms at scale.
- Worked in complex systems with many moving parts; you have opinions on monoliths vs. services.
- Designed and implemented developer-friendly APIs to enable scalable and reliable integration.
- Hands-on experience setting up and operating Kubernetes.
- A passion for building tools that empower developers.
- Strong communication and collaboration skills, with the ability to understand customer needs and distill complex topics into clear, actionable insights.
- At least 3 years of full-time software engineering experience.
- You have worked on machine learning platform teams in the past.
- You have experience working with or on teams that have put ML/AI into production, even though this role does not entail building ML models directly.
- You have some exposure to serving Generative AI features where GPUs are costly commodities and workloads can take significant time to finish.
You'll be working from our beautiful office in the Mission, San Francisco for this role. We want to build a strong in-person culture for the people who are there. We want you to be there, not feel like we have to drag you in.
Salary: $200k - $280k USD
Apply nowName: Required
Email: Required
Phone number:
City:
Country:
Resume: If you haven't got a resume, a LinkedIn profile, GitHub profile, or some plain text is fine too.
LinkedIn profile:
Can you work from our office in San Francisco at least 3 days a week? Required
Yes / I'm willing to relocate / No
Can you legally work in the United States? Required
Yes / No
Do you have at least 3 years of full-time software engineering experience? Required
Yes / No
Have you worked on building platforms? Required
Do you have experience working on teams that have built and shipped machine learning models? Required (This is not required, but would love to know if you do)
#J-18808-Ljbffr-
Infrastructure Engineer
3 weeks ago
San Francisco, United States Factory Full timeFactory is seeking a seasoned Infrastructure Engineer to architect, build, and maintain our advanced cloud infrastructure.What you will do and achieve:Lead the design and implementation of a robust, secure, and highly scalable cloud infrastructure, utilizing cutting-edge tools like Docker and Terraform.Work in close collaboration with product teams and...
-
Infrastructure Engineer
3 weeks ago
San Francisco, United States Resolve Full timeAbout Resolve AIResolve is building AI that operates as a Production Engineer. It investigates and resolves incidents, and handles operational tasks enhancing system reliability, and making on-call stress-free.Our founders (Spiros Xanthos and Mayank Agarwal) are the core creators of OpenTelemetry and led Splunk Observability. They have 2 successful exits to...
-
Infrastructure Engineer
3 weeks ago
San Francisco, United States Rollbar, Inc. Full timeInngest is solving long-standing developer problems related to queueing, event-driven systems, and step functions in a novel way — which means we’re creating first-of-its-kind solutions.Infrastructure engineering is a critical part of Inngest. It involves everything from K8S, Terraform, and Ansible playbooks (for bare metal) to developing high-throughput...
-
Cloud Infrastructure Engineer
4 days ago
San Francisco, United States ZipRecruiter Full timeJob DescriptionDo you enjoy solving technical issues, empathize with customer user experiences and want to keep up with the latest tech? We are looking for a Cloud Infrastructure Engineer that will work with talented software engineering and support teams to deploy, maintain and ensure reliability of our applications in a fast-paced environment.Successful...
-
Data Infrastructure Engineer
2 weeks ago
San Francisco, United States OpenAI Full timeYou’ll join the team that’s behind OpenAI’s data infrastructure that powers critical engineering, product, alignment teams that are core to the work we do at OpenAI. The systems we support include our data warehouse, batch compute infrastructure, streaming infrastructure, data orchestration system, data lake, vector databases, critical integrations,...
-
MLOps/Infrastructure Engineer
4 days ago
San Francisco, United States Mach9 Robotics Inc Full timeAbout Mach9Mach9 is at the forefront of leveraging advanced machine learning and computer vision techniques to transform raw geospatial data into actionable insights to help civil engineers build and maintain infrastructure globally. Our first product, Mach9 Digital Surveyor, helps surveyors automatically extract features from large-scale imagery and 3D...
-
Stream Infrastructure Engineer
1 week ago
San Francisco, United States OpenAI Full timeYou’ll join the team that’s behind OpenAI’s data infrastructure that powers critical engineering, product, and alignment teams that are core to the work we do at OpenAI.The Streaming Infrastructure team within Data Platform is responsible for building and maintaining our streaming platform. This platform plays a crucial role in facilitating the...
-
Data Infrastructure Engineer
1 week ago
San Francisco, United States OpenAI Full timeAbout the TeamYou’ll join the team that’s behind OpenAI’s data infrastructure that powers critical engineering and product teams core to the work we do at OpenAI. The systems we support include our data warehouse, batch compute infrastructure, streaming infrastructure, data orchestration system, data lake, vector databases, critical integrations, and...
-
Infrastructure Engineering Team Lead
4 weeks ago
San Francisco, California, United States CoinTracker Full timeAt CoinTracker, we're on a mission to increase financial freedom and prosperity worldwide. Our technology stack is the backbone of this mission, and we're looking for a talented Infrastructure Engineering Manager to lead the charge.You'll be responsible for building and leading a high-performing infrastructure team that powers our technology stack. This...
-
Machine Learning Infrastructure Engineer
5 days ago
San Francisco, United States ZipRecruiter Full timeJob DescriptionCompany Overview: Welcome to the forefront of machine learning infrastructure! At our company, we're passionate about pushing the boundaries of artificial intelligence and machine learning. Our mission is to develop robust and scalable infrastructure solutions that empower data scientists and machine learning engineers to build, deploy, and...
-
Machine Learning Infrastructure Engineer
3 weeks ago
San Francisco, United States Unreal Gigs Full timeCompany Overview: Welcome to the forefront of machine learning infrastructure! At our company, we're passionate about pushing the boundaries of artificial intelligence and machine learning. Our mission is to develop robust and scalable infrastructure solutions that empower data scientists and machine learning engineers to build, deploy, and manage...
-
Platform Engineer
3 weeks ago
San Francisco, United States Unreal Gigs Full timeAre you passionate about building, managing, and scaling platforms that power modern applications? Do you have the technical expertise to design resilient, efficient infrastructure that supports development and operational needs? If you’re ready to shape the backbone of technology solutions that drive innovation, our client has the perfect role for you....
-
Machine Learning Engineer
1 month ago
san francisco, United States Apollo Solutions Full timeFounding Machine Learning Engineer - InfrastructureWe are searching for a Founding ML Infrastructure Engineer who is excited about going a pre-seed start-up and building from the ground up.They have been backed by top tier Venture Capital and are building the infrastructure for real-time AI applications such as voice and video.You will play a crucial role in...
-
Machine Learning Engineer
2 months ago
San Francisco, United States Apollo Solutions Full timeFounding Machine Learning Engineer - InfrastructureWe are searching for a Founding ML Infrastructure Engineer who is excited about going a pre-seed start-up and building from the ground up.They have been backed by top tier Venture Capital and are building the infrastructure for real-time AI applications such as voice and video.You will play a crucial role in...
-
Platform Engineer
4 days ago
San Francisco, United States ZipRecruiter Full timeJob DescriptionAre you passionate about building, managing, and scaling platforms that power modern applications? Do you have the technical expertise to design resilient, efficient infrastructure that supports development and operational needs? If you’re ready to shape the backbone of technology solutions that drive innovation, our client has the perfect...
-
Senior Cloud Infrastructure Engineer
1 month ago
San Francisco, California, United States Sight Machine, Inc. Full timeAbout the RoleSight Machine, Inc. is seeking a highly skilled Cloud Infrastructure Engineer to join our team. As a Cloud Infrastructure Engineer, you will be responsible for designing, deploying, and managing our cloud infrastructure to ensure high availability, scalability, and security.Key ResponsibilitiesDesign and implement cloud infrastructure...
-
Software Engineer, Infrastructure
2 weeks ago
San Francisco, United States OpenAI Full timeThe Applied Engineering team works across research, engineering, product, and design to bring OpenAI’s technology to consumers and businesses.We seek to learn from deployment and distribute the benefits of AI, while ensuring that this powerful tool is used responsibly and safely. Safety is more important to us than unfettered growth.About the RoleThe...
-
Senior Data Infrastructure Engineer
1 day ago
San Francisco, United States AngelList - Jobboard Full timeOur CompanyAt Sentio, we are building the infrastructure and developer tools for blockchain to accelerate dApp proliferation. Trusted by over 100 teams across different chains and use cases, our customers include leading Web3 projects like Pendle, Renzo, Pyth, Pancake, and Zircuit.Sentio was founded by a team of serial entrepreneurs and veteran engineers...
-
Software Engineer, Infrastructure
3 weeks ago
San Francisco, United States Orb Full timeMission Orb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage-whether that's through seats, consumption, feature limits, or usage-based tiers. Orb brings that opportunity to every software company. We are reimagining...
-
Software Engineer, Infrastructure
5 days ago
San Francisco, United States Orb Full timeMissionOrb is on an ambitious mission to provide every business with the infrastructure to unlock their revenue. Best-in class businesses find ways to effectively align their monetization to product usage—whether that's through seats, consumption, feature limits, or usage-based tiers. Orb brings that opportunity to every software company.We are reimagining...