Senior/Principal Software Engineer

4 weeks ago


San Francisco, United States Understanding Recruitment Full time

Senior/Principal Software Engineer (Distributed Systems, ML Training) Are you passionate about building scalable systems that power the future of AI? We're seeking a highly motivated Senior/Principal Software Engineer to drive innovation in our distributed machine learning infrastructure. As a leader in advanced machine learning compute solutions, we bridge the gap between theoretical AI and practical applications, ensuring complex models can be trained efficiently and at scale. In this role, you'll be a key player in architecting and developing our distributed training frameworks, enabling rapid iteration and deployment of cutting-edge machine learning models. Your work will directly impact the performance and capabilities of our next-generation AI technologies. What We Offer Competitive compensation package with comprehensive benefits Generous professional development opportunities (conferences, training, etc.) Access to state-of-the-art tools and technologies Collaborative team environment with regular retreats Opportunity to publish and present your work Key Responsibilities Design, develop, and optimize distributed training frameworks for large-scale machine learning models Lead technical efforts to improve the scalability, performance, and efficiency of our compute infrastructure Collaborate with research teams to integrate cutting-edge algorithms and models into our systems Drive innovation in areas such as model parallelism, data parallelism, and hybrid approaches Mentor and guide junior engineers, fostering a culture of technical excellence We are looking for individuals with a strong software engineering background and demonstrated expertise in distributed systems and cloud infrastructure (AWS, GCP, Azure, etc.). You should have a proven track record of building and deploying large-scale machine learning systems. A deep understanding of distributed training frameworks (e.g., Horovod, PyTorch DDP, TensorFlow Distributed) is essential. Proficiency in Python and/or C++ is required, along with excellent communication and collaboration skills. Experience with high-performance computing (HPC) clusters, GPU acceleration, or optimization techniques for distributed training would be a plus. Additionally, a background in machine learning research or development, or contributions to open-source projects or publications in relevant fields, would be highly valued. Keywords:

Distributed computing, parallel processing, cluster computing, scalability, multi-node architecture, data partitioning, load balancing, high-performance computing (HPC), fault tolerance, model parallelism, data parallelism, Horovod, PyTorch DDP, TensorFlow Distributed, machine learning infrastructure, cloud-based ML, GPU/CPU collaboration.

#J-18808-Ljbffr



  • San Francisco, California, United States Fastly Full time

    Fastly helps people stay better connected with the things they love. Fastly's edge cloud platform enables customers to create great digital experiences quickly, securely, and reliably by processing, serving, and securing our customers' applications as close to their end-users as possible — at the edge of the Internet. The platform is designed to take...


  • San Francisco, California, United States Fastly Full time

    Fastly helps people stay better connected with the things they love. Fastly's edge cloud platform enables customers to create great digital experiences quickly, securely, and reliably by processing, serving, and securing our customers' applications as close to their end-users as possible — at the edge of the Internet. The platform is designed to take...


  • San Francisco, California, United States Fastly Full time

    Fastly helps people stay better connected with the things they love. Fastly's edge cloud platform enables customers to create great digital experiences quickly, securely, and reliably by processing, serving, and securing our customers' applications as close to their end-users as possible — at the edge of the Internet. The platform is designed to take...


  • San Francisco, California, United States Palo Alto Networks Full time

    Principal Software Engineer - Join Our Cybersecurity Team at Palo Alto NetworksCompany DescriptionOur MissionAt Palo Alto Networks, our mission is to be the trusted cybersecurity partner that safeguards our digital way of life. We envision a world where each day is safer and more secure, driven by innovation and disruption in the cybersecurity landscape.Our...


  • San Jose, California, United States Siemens Digital Industries Software Full time

    Job Family: Research & Development Req ID: Siemens Digital Industries Software is a leading provider of solutions for the design, simulation, and manufacture of products across many different industries. Formula 1 cars, skyscrapers, ships, space exploration vehicles, and many of the objects we see in our daily lives are being conceived and manufactured using...


  • San Francisco, United States BHO Tech Full time

    We’re looking for a principal software engineer to lead architecture and development of our next generation financial infrastructure platform built on bleeding edge technologies with distributed systems architecture. We are not shy to fail fast and learn quickly. We are a passionate team working on building customer-centric, mission-critical, highly...


  • San Francisco, California, United States bodo Full time

    At Bodo, we are driven by a mission to revolutionize how organizations harness the power of data by democratizing efficient compute at scale. With the creation of the first compute engine that brings HPC levels of performance and efficiency to large-scale data processing, we have already helped some of the most data-forward companies in the world with their...


  • San Francisco, California, United States Oracle Full time

    We are looking for a skilled Senior Principal Software Development Engineer to join our OCI Compute team. Our main focus is on creating and scaling services that allow customers to provision and manage both Bare Metal and Virtual Machine Compute instances.Key Responsibilities:- Developing and maintaining highly available APIs for launching and managing...


  • San Diego, United States Intuit Full time

    Technology leaders at Intuit think strategically and drive for results. They build high performing teams by putting the right people in the right job at the right time. Leaders help to innovate by thinking differently. They lead their teams to embrac Principal Software Engineer, Software Engineer, Leader, Principal, Engineer, Technology, Software


  • San Francisco, United States Burq, Inc. Full time

    About Burq Burq started with an ambitious mission: how can we turn the complex process of offering delivery into a simple turnkey solution. We started with building the largest network of delivery networks, partnering with some of the biggest delivery companies. We then made it extremely easy for businesses to plug into our network and start offering...


  • San Francisco, United States Burq, Inc. Full time

    About Burq Burq started with an ambitious mission: how can we turn the complex process of offering delivery into a simple turnkey solution. We started with building the largest network of delivery networks, partnering with some of the biggest delivery companies. We then made it extremely easy for businesses to plug into our network and start offering...


  • San Francisco, United States Burq, Inc. Full time

    About Burq Burq started with an ambitious mission: how can we turn the complex process of offering delivery into a simple turnkey solution. We started with building the largest network of delivery networks, partnering with some of the biggest delivery companies. We then made it extremely easy for businesses to plug into our network and start offering...


  • San Francisco, United States Burq, Inc. Full time

    About Burq Burq started with an ambitious mission: how can we turn the complex process of offering delivery into a simple turnkey solution. We started with building the largest network of delivery networks, partnering with some of the biggest delivery companies. We then made it extremely easy for businesses to plug into our network and start offering...


  • San Francisco, United States Social Finance Ltd Full time

    Employee Applicant Privacy Notice Who we are: Shape a brighter financial future with us. Together with our members, we're changing the way people think about and interact with personal finance. We're a next-generation financial services company and national bank using innovative, mobile-first technology to help our millions of members reach their goals. The...


  • San Francisco, California, United States ARC Technologies Full time

    Arc is the future of startup finance.Arc helps startups grow through its integrated cash management and capital markets platform. With Arc, companies don't need to choose between safety, liquidity, and returns — they get all three in one software platform.Startups can access venture debt and working capital, deposit funds into FDIC insurance eligible...


  • San Diego, United States Cubic Full time

    This is a contingent position 6-8 months. Performs complex software engineering tasks. Provides technical software expertise to research, design, develop and test engineering activities. Reviews project progress and evaluates results. Estimates costs Software Engineer, Principal Software Engineer, Software, Engineer, Development, Technical, Technology,...


  • San Antonio, United States Northrop Grumman Full time

    Join Northrop Grumman on our continued mission to push the boundaries of possible across land, sea, air, space, and cyberspace. Enjoy a culture where your voice is valued and start contributing to our team of passionate professionals providing real-life solutions to our world's biggest challenges. We take pride in creating purposeful work and allowing our...


  • San Francisco, California, United States Palo Alto Networks Full time

    Company Description Our MissionAt Palo Alto Networks everything starts and ends with our mission:Being the cybersecurity partner of choice, protecting our digital way of life.Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and...


  • San Francisco, United States Otter Full time

    Who We Are CloudKitchens helps restaurateurs around the world succeed in online food delivery - our goal is to make food more affordable, higher quality and convenient for everyone. We take underutilized properties and transform them into smart kitchens so they can better serve restaurateurs, customers and the neighborhoods they’re in. Every time we launch...


  • San Mateo, United States Manticore Games Inc. Full time

    Important PSA : we've received reports of scammers who are posing as Manticore recruiters using a fraudulent email address. Legit emails from Manticore will always come from a manticoregames.com address - note the plural 'games', not singular . For real roles at Manticore, we'll communicate by official email. We don't use gmail addresses, and we don't ask...