Principal Staff Engineer AI Infrastructure

1 week ago


San Francisco CA, United States Andiamo Full time

Overview Principal Staff Engineer - AI Infrastructure. We are seeking a Principal Staff Engineer to lead the architecture and development of our next-generation AI infrastructure. This role sits at the intersection of large-scale distributed systems and cutting-edge machine learning, powering the platforms that enable researchers and engineers to build, train, and deploy AI models at global scale. As a senior technical leader, you will define architectural strategy, influence cross-organizational initiatives, and guide the design of highly reliable, efficient, and scalable systems. Youll balance deep technical execution with strategic visionmentoring senior engineers, collaborating with AI researchers, and ensuring our infrastructure accelerates innovation while maintaining world-class reliability. What Youll Do Design & Scale AI Infrastructure: Architect and build distributed training, inference, and data pipelines that support large-scale AI workloads across GPUs and heterogeneous environments. Lead Cloud-Native Innovation: Drive adoption of Kubernetes, Docker, and modern orchestration frameworks to optimize model deployment, resource allocation, and cluster utilization. Optimize Performance at Scale: Develop high-throughput, low-latency services and memory-efficient systems to support petabyte-scale data and massive model sizes. Advance Observability & Reliability: Implement monitoring, tracing, and fault-tolerance strategies to ensure resilient AI systems in production. Collaborate with Research & Product: Partner with ML scientists, product engineers, and platform teams to design infrastructure that accelerates experimentation and model iteration. Mentor & Inspire: Support the technical growth of senior engineers, fostering a culture of excellence, innovation, and ownership. Shape Technical Strategy: Define long-term roadmaps for AI infrastructure, balancing near-term delivery with foundational investments in scalability, efficiency, and reliability. What Were Looking For Extensive Experience: 10+ years in distributed systems, large-scale infrastructure, or platform engineering, with experience supporting AI/ML workloads strongly preferred. Programming Mastery: Deep expertise in Java, Python, or C++, with proven ability to build performant and reliable systems. AI/ML Infrastructure Knowledge: Familiarity with ML frameworks (TensorFlow, PyTorch, JAX), distributed training strategies, GPU scheduling, and data pipeline optimization. Modern Infrastructure Skills: Hands-on experience with Kubernetes, Docker, CI/CD pipelines, cloud platforms (AWS/GCP/Azure), and observability tools (Prometheus, Grafana, Datadog). Systems Design Expertise: Strong foundation in algorithms, concurrency, and systems architecture for high-scale, fault-tolerant environments. Leadership & Influence: Demonstrated success driving cross-functional initiatives, mentoring senior engineers, and setting engineering-wide standards. Product Mindset: Ability to balance technical rigor with usability and speed, ensuring infrastructure empowers rapid iteration and impactful outcomes. About Andiamo Andiamo is a globally recognized staffing and consulting firm specializing in placing the top 2% of technology and go-to-market professionals with the worlds largest and most well-known companies. For over 20 years, weve maintained the status of tier-one vendor for firms such as Amazon, Bloomberg, Palantir, MasterCard, Visa, Two Sigma, Citadel, as well as other major financial services firms, elite hedge funds, Google-backed tech start-ups, and major software firms. Our talent solutions include Permanent Placement, Contract Staffing, Executive Search, and Dedicated Recruiting Services (RPO). Find out more at #J-18808-Ljbffr



  • San Francisco, CA, United States Block Full time

    Staff/Principal Software Engineer, CI Infrastructure Be one of the first applicants, read the complete overview of the role below, then send your application for consideration. The blocks that form our foundational teams — People, Finance, Counsel, Hardware, Information Security, Platform Infrastructure Engineering, and more — provide support and...


  • San Francisco, California, United States AdsGency AI Full time $180,000 - $300,000 per year

    Principal Member of Technical Staff – AI Systems & InfrastructureCompany:AdsGency AILocation:Onsite (San Francisco City)**Employment Type:Full-TimeRelocation to San Francisco City RequiredWe Sponsor OPT / CPT / STEM-OPT / Second-Year H1B About AdsGency AIWe'reAdsGency AI— an AI-native startup building amulti-agent automation layer for digital...


  • San Francisco, United States Snorkel AI Full time

    Principal Software Engineer – AI Platform Join to apply for the Principal Software Engineer – AI Platform role at Snorkel AI About Snorkel At Snorkel, we believe meaningful AI doesn’t start with the model, it starts with the data. We’re on a mission to help enterprises transform expert knowledge into specialized AI at scale. The AI landscape has gone...


  • San Francisco, United States The Rundown AI, Inc. Full time

    We are looking for a Principal Software Engineer to shape our product and technical systems to meet the AI challenges of today and tomorrow. As a Principal Software Engineer, you’ll work across teams and across the stack to deliver major new features and infrastructure, to improve our practices and culture, and to align Engineering on a shared technical...


  • San Francisco, United States Snorkel AI Full time

    About Snorkel At Snorkel, we believe meaningful AI doesn’t start with the model, it starts with the data. We’re on a mission to help enterprises transform expert knowledge into specialized AI at scale. The AI landscape has gone through incredible changes between 2015, when Snorkel started as a research project in the Stanford AI Lab, to the generative AI...

  • AI Engineer, AIOps

    1 day ago


    San Francisco, United States Eloquent AI Full time

    Meet Eloquent AI At Eloquent AI, we’re building the next generation of AI Operators—multimodal, autonomous systems that execute complex workflows across fragmented tools with human-level precision. Our technology goes far beyond chat: it sees, reads, clicks, types, and makes decisions—transforming how work gets done in regulated, high-stakes...


  • San Francisco, United States Stack AI, Inc. Full time

    About the Role We’re hiring an AI Infrastructure Engineer to shape and scale the backend systems that power our AI platform. As a Series A company, your work will be foundational, enabling safe, efficient, and reliable AI workflows from end to end. What You’ll Do Design and implement scalable backend architectures for AI workloads (inference,...


  • San Francisco, CA, United States Block Full time

    Staff/Principal Software Engineer, CI Infrastructure Remote Bay Area, CA, US Posted Date: 10/16/25 Block is one company built from many blocks, all united by the same purpose of economic empowerment. The blocks that form our foundational teams People, Finance, Counsel, Hardware, Information Security, Platform Infrastructure Engineering, and more provide...


  • San Francisco, United States Together AI Full time

    Staff Engineer, Distributed Storage and HPC & AI Infrastructure About the Role In this role, you will design and deliver multi-petabyte storage systems purpose-built for the world’s largest AI training and inference workloads. You’ll architect high-performance parallel filesystems and object stores, evaluate and integrate cutting‑edge technologies such...


  • San Francisco, CA, United States Block Full time

    Staff/Principal Software Engineer, CI Infrastructure Block is one company built from many blocks, all united by the same purpose of economic empowerment. The blocks that form our foundational teams People, Finance, Counsel, Hardware, Information Security, Platform Infrastructure Engineering, and more provide support and guidance at the corporate level. They...