Sr. Software Engineer, ML Edge Inference Engineer

1 week ago


San Francisco CA, United States Serve Robotics Full time

Sr. Software Engineer, ML Edge Inference Engineer Join to apply for the Sr. Software Engineer, ML Edge Inference Engineer role at Serve Robotics. Base pay range $190,000.00/yr - $240,000.00/yr At Serve Robotics, were reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. Its designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses. The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles, Miami, Dallas, Atlanta and Chicago while doing commercial deliveries. Were looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity. We are tech industry veterans in software, hardware, and design who are pooling our skills to build the future we want to live in. We are solving real-world problems leveraging robotics, machine learning and computer vision, among other disciplines, with a mindful eye towards the end-to-end user experience. Our team is agile, diverse, and driven. We believe that the best way to solve complicated dynamic problems is collaboratively and respectfully. We are seeking a highly skilled Sr. Software Engineer, ML Edge Inference Engineer to join our robotics team. This technical role bridges the gap between ML research and real-time deployment, enabling advanced ML models to run efficiently on edge hardware such as NVIDIA Jetson platforms. You will work closely with ML researchers, embedded systems engineers, and robotics software teams to ensure that state-of-theart models can be deployed with optimal performance on robotic platforms. Responsibilities Own the full lifecycle of ML model deployment on robotsfrom handoff by the ML team to full system integration. Convert, optimize, and integrate trained models (e.g., PyTorch/ONNX/TensorRT) for Jetson platforms using NVIDIA tools. Develop and optimize CUDA kernels and pipelines for low-latency, high-throughput model inference. Profile and benchmark existing ML workloads using tools like Nsight, nvprof, and TensorRT profiler. Identify and remove compute and memory bottlenecks for realtime inference. Design and implement strategies for quantization, pruning, and other model compression techniques suited for edge inference. Ensure models are robust to the resource constraints of realtime, lowpower robotic systems. Manage memory layout, concurrency, and scheduling for optimized GPU and CPU usage on Jetson devices. Build benchmarking pipelines for continuous performance evaluation on hardwareintheloop systems. Collaborate with QA and systems teams to validate model behavior in field scenarios. Work closely with ML researchers to influence model architectures for edge deployability and provide technical guidance on the feasibility of realtime ML models in the robotics stack. Qualifications Bachelors degree in Computer Science, Robotics, Electrical Engineering, or equivalent field. 5+ years experience in deploying ML models on embedded or edge platforms (preferably robotics). 3+ years of experience with CUDA, TensorRT, and other NVIDIA acceleration tools. Proficient in Python and C++, especially for performancesensitive systems. Experience with NVIDIA Jetson (e.g., Xavier, Orin) and edge inference tools. Familiarity with model conversion workflows (e.g., PyTorch ONNX TensorRT). Please note: The base salary range listed in this job description reflects compensation for candidates based in the San Francisco Bay Area. While we prefer candidates located in the Bay Area, we are also open to qualified talent working remotely across the United States. Base salary range (U.S. all locations): $180,000 $205,000. What Makes You Standout Masters degree in Computer Science, Robotics, Electrical Engineering, or equivalent field. Experience with realtime robotics systems (e.g., ROS2, middleware, safetycritical constraints and Linux embedded systems). Knowledge of performance tuning under thermal, power, and memory constraints on embedded devices. Experience with model quantization (e.g., INT8), sparsity, and latencyaware model design. Contributions to opensource ML or CUDA projects is a plus. Seniority level MidSenior level Employment type Fulltime Job function Engineering and Information Technology Industries Technology, Information and Internet Referrals increase your chances of interviewing at Serve Robotics by 2x. #J-18808-Ljbffr



  • San Francisco, United States Serve Robotics Full time

    Sr. Software Engineer, ML Edge Inference Engineer Join to apply for the Sr. Software Engineer, ML Edge Inference Engineer role at Serve Robotics. Base pay range $190,000.00/yr - $240,000.00/yr At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away...


  • San Francisco, United States Serve Robotics Full time

    At Serve Robotics, we're reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It's designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses. The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los...


  • San Francisco, CA, United States Serve Robotics Full time

    At Serve Robotics, we're reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It's designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses. The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los...


  • San Francisco, CA, United States Serve Robotics Full time

    Software Engineer, ML Edge Inference Engineer Software Engineer, ML Edge Inference Engineer role at Serve Robotics. At Serve Robotics, were reimagining how things move in cities. Its designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses. The Serve fleet has been delighting...

  • Software Engineer

    4 weeks ago


    San Francisco, United States Alldus Full time

    Get AI-powered advice on this job and more exclusive features.Direct message the job poster from AlldusPrincipal Recruitment Consultant | AI & Machine Learning | Co-organizer of the AI in Action PodcastMy client is searching for a talented engineer to work on ML/LLM inference and serving. They specialize in developing next-gen LLM fine-tuning and inference...

  • Staff ML Engineer

    2 weeks ago


    San Francisco, CA, United States Strativ Group Full time

    Staff ML Engineer - Infra & Inference We are partnered with a Stealth AI Infra startup (backed by a Tier 1 AI Lab and advised by 2 of the world's most prominent ML thought-leaders), who are hiring a Staff Engineer (genuine progression to HoE / Chief Engineer). The business already have enterprise customer traction & are backed by Perplexity and the VC who...


  • San Francisco, United States Serve Robotics Full time

    A pioneering robotics firm in San Francisco seeks a skilled Sr. Software Engineer specializing in ML Edge Inference. This role requires a strong background in deploying ML models on NVIDIA Jetson platforms, optimizing performance for robotics systems, and working collaboratively with cross-functional teams. The successful candidate will tackle challenges in...


  • San Francisco, CA, United States Serve Robotics Full time

    A pioneering robotics firm in San Francisco seeks a skilled Sr. Software Engineer specializing in ML Edge Inference. This role requires a strong background in deploying ML models on NVIDIA Jetson platforms, optimizing performance for robotics systems, and working collaboratively with cross-functional teams. The successful candidate will tackle challenges in...


  • San Francisco, United States Waymo Full time

    Software Engineer, ML Inference, Simulation Infrastructure Join to apply for the Software Engineer, ML Inference, Simulation Infrastructure role at Waymo Get AI-powered advice on this job and more exclusive features. Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google...

  • Staff ML Engineer

    2 weeks ago


    San Francisco, United States Strativ Group Full time

    Staff ML Engineer - Infra & InferenceWe are partnered with a Stealth AI Infra startup (backed by a Tier 1 AI Lab and advised by 2 of the world's most prominent ML thought-leaders), who are hiring a Staff Engineer (genuine progression to HoE / Chief Engineer).The business already have enterprise customer traction & are backed by Perplexity and the VC who led...