Infrastructure Engineer

1 day ago


San Francisco CA United States Replicate, Inc. Full time

At Replicate, we believe AI shouldn’t be exclusive to tech giants — it should be accessible to every software developer. Our goal is straightforward: build the best platform for creating, deploying, and running machine learning models. As an Infrastructure Engineer on the Platform team, you’ll play a key role in making generative AI available to everyone.

The Platform team at Replicate oversees the entire lifecycle of models, from packaging and deployment to serving, scaling, and monitoring. You’ll be developing the infrastructure that supports thousands of models and powers millions of predictions daily. This is a chance to build something truly innovative, where each decision you make has a tangible impact and allows your creativity to shine.

What you’ll be doing:
  1. Designing and building our deployment and model-serving platform.
  2. Building technology to operate the latest advancements in the ML and AI space.
  3. Designing systems to maximize the utilization and reliability of our Kubernetes clusters and GPUs, including multi-regional traffic shifting and failover capabilities.
  4. Owning and optimizing fair and reliable task allocation and queuing across a diverse set of customers with heterogeneous workloads.
  5. Working with our Models team to speed up model inference through techniques like caching, weights management, machine configurations, and runtime optimizations in Python and PyTorch.

Working with technologies such as:

  • Python, Go, and Node.js
  • Kubernetes and Terraform
  • Redis, Google BigQuery, and PostgreSQL
We're looking for the right person, not just someone who checks boxes, but it’s likely you have…
  1. Experience building platforms at scale.
  2. Worked in complex systems with many moving parts; you have opinions on monoliths vs. services.
  3. Designed and implemented developer-friendly APIs to enable scalable and reliable integration.
  4. Hands-on experience setting up and operating Kubernetes.
  5. A passion for building tools that empower developers.
  6. Strong communication and collaboration skills, with the ability to understand customer needs and distill complex topics into clear, actionable insights.
  7. At least 3 years of full-time software engineering experience.
These aren’t hard requirements, but we definitely want to talk with you if…
  1. You have worked on machine learning platform teams in the past.
  2. You have experience working with or on teams that have put ML/AI into production, even though this role does not entail building ML models directly.
  3. You have some exposure to serving Generative AI features where GPUs are costly commodities and workloads can take significant time to finish.

You'll be working from our beautiful office in the Mission, San Francisco for this role. We want to build a strong in-person culture for the people who are there. We want you to be there, not feel like we have to drag you in.

Salary: $200k - $280k USD

Apply now

Name: Required

Email: Required

Phone number:

City:

Country:

Resume: If you haven't got a resume, a LinkedIn profile, GitHub profile, or some plain text is fine too.

LinkedIn profile:

Can you work from our office in San Francisco at least 3 days a week? Required

Yes / I'm willing to relocate / No

Can you legally work in the United States? Required

Yes / No

Do you have at least 3 years of full-time software engineering experience? Required

Yes / No

Have you worked on building platforms? Required

Do you have experience working on teams that have built and shipped machine learning models? Required (This is not required, but would love to know if you do)

#J-18808-Ljbffr

  • San Francisco, CA, United States Mach9 Robotics Inc Full time

    About Mach9 Mach9 is at the forefront of leveraging advanced machine learning and computer vision techniques to transform raw geospatial data into actionable insights to help civil engineers build and maintain infrastructure globally. Our first product , Mach9 Digital Surveyor, helps surveyors automatically extract features from large-scale imagery and 3D...


  • San Francisco, CA, United States OpenAI Full time

    About the Team You’ll join the team that’s behind OpenAI’s data infrastructure that powers critical engineering and product teams core to the work we do at OpenAI. The systems we support include our data warehouse, batch compute infrastructure, streaming infrastructure, data orchestration system, data lake, vector databases, critical integrations, and...


  • San Francisco, CA, United States Twelve Labs Full time

    Who we are At Twelve Labs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do. Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with...


  • San Francisco, CA, United States Joinslash Full time

    About Slash Slash is the premier banking platform for small businesses. Our all in one virtual card, bill pay, accounting, and invoicing platform helps entrepreneurs stay on top of their finances, allowing them to spend more time doing what they love. Slash powers hundreds of millions of dollars a year in purchases. Our investors include Y Combinator,...

  • Platform Engineer

    2 hours ago


    San Francisco, CA, United States ZipRecruiter Full time

    Job Description Are you passionate about building, managing, and scaling platforms that power modern applications? Do you have the technical expertise to design resilient, efficient infrastructure that supports development and operational needs? If you’re ready to shape the backbone of technology solutions that drive innovation, our client has the perfect...


  • San Francisco, CA, United States Nexus Full time

    We are seeking a skilled Senior Software Engineer to join our infrastructure team and help us shape the future of verifiable computing. Leveraging your expertise in Rust, you will contribute to the development of efficient, scalable, and secure systems that support our ambitious goals. About Nexus The Nexus Project is a scientific and engineering effort...


  • San Francisco, CA, United States Recruiting From Scratch Full time

    Who is Recruiting from Scratch : Recruiting from Scratch is a talent firm that focuses on placing the best candidate for our clients. Our team is 100% remote and we work with teams across North America, South America, and Europe to help them hire. Senior ML Infrastructure Engineer | AI Infrastructure Scale-Up | SF Based Base: $180K - $300K + Equity...


  • San Francisco, United States Factory Full time

    Factory is seeking a seasoned Infrastructure Engineer to architect, build, and maintain our advanced cloud infrastructure.What you will do and achieve:Lead the design and implementation of a robust, secure, and highly scalable cloud infrastructure, utilizing cutting-edge tools like Docker and Terraform.Work in close collaboration with product teams and...


  • San Francisco, United States Resolve Full time

    About Resolve AIResolve is building AI that operates as a Production Engineer. It investigates and resolves incidents, and handles operational tasks enhancing system reliability, and making on-call stress-free.Our founders (Spiros Xanthos and Mayank Agarwal) are the core creators of OpenTelemetry and led Splunk Observability. They have 2 successful exits to...


  • San Francisco, United States Rollbar, Inc. Full time

    Inngest is solving long-standing developer problems related to queueing, event-driven systems, and step functions in a novel way — which means we’re creating first-of-its-kind solutions.Infrastructure engineering is a critical part of Inngest. It involves everything from K8S, Terraform, and Ansible playbooks (for bare metal) to developing high-throughput...


  • San Francisco, CA, United States Pomelo, Inc. Full time

    Senior/Staff Software Engineer, Infrastructure Pomelo’s mission is to increase financial access and empowerment for immigrants and their loved ones back home. We are proud to be the first financial technology platform to combine consumer credit and global remittances. Our product solves the worst aspects of money transfer by empowering our customers to use...


  • San Francisco, CA, United States Anthropic Limited Full time

    About the team: The Agents Infrastructure team at Anthropic is on a mission to build seamless and robust infrastructure and SDKs for LLM-based agents. Our current priorities include developing sandboxed code execution environments and associated tool use SDKs. About Anthropic: Anthropic is an AI safety and research company working to build reliable,...


  • San Francisco, CA, United States Oleria Corp. Full time

    About Oleria Oleria is an enterprise cybersecurity startup founded by notable industry senior leaders Jim Alkove and Jagadeesh Kunda, with deep security, data, and SaaS experience building and securing some of the world’s largest platforms and products used by billions of people worldwide every day. Oleria has received over $43M in funding from Evolution...


  • San Francisco, CA, United States Salesforce, Inc. Full time

    Senior, Infrastructure Systems Engineering Apply remote type Office Tech-Flexible locations Washington - Seattle California - San Francisco time type Full time posted on Posted 2 Days Ago job requisition id JR271350 To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating...


  • San Francisco, CA, United States Arbitrum Full time

    Senior Distributed Systems Engineer (Infrastructure) We’re looking for an incredible senior engineer to help us build the future of blockchain scalability. This is an ideal opportunity for an engineer who is already passionate about tackling problems in blockchain scalability, or looking to break into the blockchain engineering space. If you’re looking...


  • San Francisco, CA, United States Unreal Gigs Full time

    Are you passionate about designing and building the robust infrastructure that powers cutting-edge AI solutions? Do you thrive on creating scalable, high-performance systems that support AI workloads, from training machine learning models to deploying real-time inference? If you're excited about building the backbone for the future of AI, then our client ...

  • Cloud Infrastructure

    4 weeks ago


    San Francisco, CA, United States Signiminds Technologies Inc Full time

    Note: Position requires having Security Clearance, candidates with clearance are encouraged to apply.Job Description:As the Senior Software Engineer -Cloud Infrastructure you will collaborate with development and quality engineering to build and maintain our continuous integration pipeline from development to production. You’ll bring a strong systems...


  • San Francisco, CA, United States Tbwa ChiatDay Inc Full time

    Senior Engineering Manager, Realtime Infrastructure Discord is used by over 200 million people every month for many different reasons, but there’s one thing that nearly everyone does on our platform: play video games. Over 90% of our users play games, spending a combined 1.5 billion hours playing thousands of unique titles on Discord each month. Discord...


  • San Francisco, CA, United States Amplitude Full time

    Amplitude is a leading digital analytics platform that helps companies unlock the power of their products. More than 3,500 customers, including Atlassian, Jersey Mike's, NBCUniversal, Shopify, and Under Armour, rely on Amplitude to gain self-service visibility into the entire customer journey. Amplitude guides companies every step of the way as they capture...


  • San Francisco, CA, United States DoorDash USA Full time

    About the Team DoorDash Labs is an independent team within DoorDash. We are working on building autonomous delivery robots from the ground-up as part of DoorDash's core delivery platform. If you have a passion for applying robotics solutions in a service used by millions of people, then we want to talk to you! About the Role We’re hiring an...