Principal Staff Software Engineer, AI Training Infrastructure

3 days ago


Mountain View, California, United States LinkedIn Full time
About the Role

We are seeking a highly skilled Principal Staff Software Engineer to join our AI Training Infrastructure team at LinkedIn. As a key member of our team, you will play a crucial role in designing and implementing high-performance AI training pipelines, data I/O, and working with open-source teams to resolve issues in popular libraries like Huggingface, Horovod, and PyTorch.

Responsibilities
  • Design and implement large-scale distributed training for personalized recommendation and large language models.
  • Improve observability and understandability of various systems, focusing on improving developer productivity and system sustenance.
  • Mentor other engineers, define our challenging technical culture, and help build a fast-growing team.
  • Work closely with the open-source community to participate and influence cutting-edge open-source projects.
  • Function as the tech-lead for several concurrent key initiatives for the Training Infrastructure and define the future of AI training platforms.
Requirements
  • BS/BA in Computer Science or related technical field or equivalent technical experience.
  • 7+ years of industry experience in software design, development, and algorithm-related solutions.
  • 7+ years of experience programming in object-oriented languages such as Python, C++, Java, Go, Rust, Scala.
  • 5+ years of experience as an architect or technical leadership position.
  • 5+ years of experience in the industry with leading/building deep learning systems.
  • Hands-on experience developing distributed systems or other large-scale systems.
Preferred Qualifications
  • MS or PhD in Computer Science or related technical discipline.
  • 12+ years of experience in software design, development, and algorithm-related solutions with at least 5 years of experience in a technical leadership position.
  • 12+ years of experience in an object-oriented programming language such as Python, C++, Java, Go, Rust, Scala.
  • 5+ years of experience with large-scale distributed systems and client-server architectures.
  • Co-author or maintainer of any open-source projects.
  • Expertise in machine learning infrastructure, including technologies like MLFlow, Kubeflow, and large-scale distributed systems.
  • Familiarity with containers and container orchestration systems.
  • Expertise in deep learning frameworks and tensor libraries like PyTorch, TensorFlow, JAX/FLAX.
What We Offer

At LinkedIn, we offer a competitive compensation package, including annual performance bonus, stock, benefits, and/or other applicable incentive compensation plans. We are committed to fair and equitable compensation practices.

Our total compensation package for this position may also include annual performance bonus, stock, benefits, and/or other applicable incentive compensation plans. For more information, visit https://careers.linkedin.com/benefits.

Equal Opportunity Statement

LinkedIn is committed to diversity in its workforce and is proud to be an equal opportunity employer. We consider qualified applicants without regard to race, color, religion, creed, gender, national origin, age, disability, veteran status, marital status, pregnancy, sex, gender expression or identity, sexual orientation, citizenship, or any other legally protected class.

LinkedIn is an Affirmative Action and Equal Opportunity Employer as described in our equal opportunity statement here: https://microsoft.sharepoint.com/:b:/t/LinkedInGCI/EeE8sk7CTIdFmEp9ONzFOTEBM62TPrWLMHs4J1C_QxVTbg?e=5hfhpE.

Please reference https://www.eeoc.gov/sites/default/files/2023-06/22-088_EEOC_KnowYourRights6.12ScreenRdr.pdf and https://www.dol.gov/ofccp/regs/compliance/posters/pdf/OFCCP_EEO_Supplement_Final_JRF_QA_508c.pdf for more information.

LinkedIn is committed to offering an inclusive and accessible experience for all job seekers, including individuals with disabilities. Our goal is to foster an inclusive and accessible workplace where everyone has the opportunity to be successful.

If you need a reasonable accommodation to search for a job opening, apply for a position, or participate in the interview process, connect with us at accommodations@linkedin.com and describe the specific accommodation requested for a disability-related limitation.

Reasonable accommodations are modifications or adjustments to the application or hiring process that would enable you to fully participate in that process. Examples of reasonable accommodations include but are not limited to:

  • Documents in alternate formats or read aloud to you
  • Having interviews in an accessible location
  • Being accompanied by a service dog
  • Having a sign language interpreter present for the interview

A request for an accommodation will be responded to within three business days. However, non-disability-related requests, such as following up on an application, will not receive a response.

LinkedIn will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by LinkedIn, or (c) consistent with LinkedIn's legal duty to furnish information.

Pay Transparency Policy Statement

As a federal contractor, LinkedIn follows the Pay Transparency and non-discrimination provisions described at this link: https://lnkd.in/paytransparency.

Global Data Privacy Notice for Job Candidates

This document provides transparency around the way in which LinkedIn handles personal data of employees and job applicants: https://lnkd.in/GlobalDataPrivacyNotice.


  • Cloud Engineer

    2 weeks ago


    Mountain View, California, United States Inworld AI Full time

    {"title": "Staff Cloud DevOps/Site Reliability Engineer", "description": "About Inworld AIInworld AI is a leading AI engine for games, enabling developers to build groundbreaking game mechanics, dynamic NPCs, and worlds that evolve with each action. Our platform powers experiences built by top game developers and has partnerships with key industry...


  • Mountain View, California, United States Tekwissen Full time

    Job Title: AI Principal EngineerAt TekWissen Group, we are seeking a highly skilled AI Principal Engineer to join our team. As a key member of our Infrastructure and Data Platform Department, you will be responsible for researching and developing new data capabilities for mobility and connected technologies.Key Responsibilities:Define research goals and...


  • Mountain View, California, United States Inworld AI Full time

    About Inworld AIInworld AI is a leading AI engine for games, enabling developers to build groundbreaking game mechanics, dynamic NPCs and worlds that evolve with each action.We are a well-funded startup with a $500 million valuation and backing from top-tier investors like Intel, Microsoft, Lightspeed, Bitkraft, Founders Fund, Kleiner Perkins, and more.We...


  • Mountain View, California, United States Inworld AI Full time

    About Inworld AIInworld AI is a pioneering startup in the field of artificial intelligence and games, boasting a $500 million valuation and backing from top-tier investors. We were recognized by CB Insights as one of the 100 most promising AI companies in the world and were nominated alongside Anthropic, DeepMind, OpenAI, and Nvidia for Generative AI...


  • Mountain View, California, United States Gatik AI Inc. Full time

    About Gatik AI Inc.Gatik AI Inc., a leader in autonomous middle mile logistics, delivers goods safely and efficiently using its fleet of light & medium-duty trucks. The company focuses on short-haul, B2B logistics for Fortune 500 customers, enabling them to optimize their hub-and-spoke supply chain operations, enhance service levels and product flow across...


  • Mountain View, California, United States Inworld AI Full time

    About Inworld AIInworld AI is a leading startup in AI and games, backed by top-tier investors and recognized as one of the 100 most promising AI companies in the world.Job DescriptionWe are seeking a Staff Cloud DevOps/Site Reliability Engineer to join our Technical Operations team, which manages the infrastructure, DevOps, and Site Reliability of our...


  • Mountain View, California, United States Contextual AI Full time

    Job OverviewContextual AI is seeking a highly skilled Machine Learning Infrastructure Engineer to join our team. As a key member of our infrastructure team, you will be responsible for designing, building, and maintaining the infrastructure that enables the development, deployment, and scaling of machine learning, data, and service pipelines.Key...


  • Mountain View, California, United States Contextual AI Full time

    Job OverviewThe Data Infrastructure team at Contextual AI is responsible for designing, building, and operating foundational data services that power product development, applied research, and customers' data-intensive workloads.Key Responsibilities:Design and implement scalable services, APIs, and databases to support the processing and ingestion of...


  • Mountain View, California, United States Inworld AI Full time

    About Inworld AIInworld AI is a leading AI engine for games, enabling developers to build groundbreaking game mechanics, dynamic NPCs and worlds that evolve with each action. Our platform powers experiences built by top industry players, and we have partnerships with key industry leaders.Our Technical Operations TeamWe are a team of experts who manage the...


  • Mountain View, California, United States LinkedIn Full time

    Unlock Your Potential as a Principal Staff Software Engineer at LinkedInAt LinkedIn, we're committed to empowering professionals to achieve their career goals. As a Principal Staff Software Engineer, you'll play a critical role in shaping the future of our platform and driving innovation in the field of software engineering.About the RoleWe're seeking a...


  • Mountain View, California, United States Global Technology Associates Full time

    Job Title: Principal Engineer and ResearcherWe are seeking a highly motivated and experienced Principal Engineer and Researcher to join our team at Global Technology Associates. As a key member of our Infrastructure and Data Platform Department, you will be responsible for conducting research in AI modeling optimization and related software and systems for...


  • Mountain View, California, United States Inworld AI Full time

    About Inworld AIInworld AI is a leading startup in the field of Artificial Intelligence and Games, with a valuation of $500 million and backing from top-tier investors. We were recognized by CB Insights as one of the 100 most promising AI companies in the world and were nominated for Generative AI Innovator of the Year at the VentureBeat Awards 2023.Our...


  • Mountain View, California, United States LinkedIn Full time

    Unlock the Power of AI with LinkedInAt LinkedIn, we're pushing the boundaries of what's possible with AI. As a Senior Software Engineer on our AI Platform team, you'll play a crucial role in building the next-gen training infrastructure to power AI use cases.ResponsibilitiesDesign and implement high-performance data I/O for large-scale distributed serving...


  • Mountain View, California, United States Hinduja Tech Limited Full time

    About the RoleWe are seeking a highly motivated and experienced Principal Engineer/Researcher to join our team at Hinduja Tech Limited. As a key member of our Infrastructure and Data Platform Department, you will play a crucial role in developing a next-generation AI and data platform capable of handling millions of vehicles and providing MaaS (Mobility as a...


  • Mountain View, California, United States Orby AI Full time

    Orby AI is at the forefront of developing an innovative Automation AI platform designed to streamline repetitive tasks. Our platform offers a distinctive "observe, learn, and automate" experience, leveraging an actions-based foundation model. Orby AI observes user activities, identifies repetitive workflows, and generates actionable suggestions to automate...


  • Mountain View, California, United States Inworld AI Full time

    About Inworld AIInworld AI is a leading startup in AI and games, backed by top-tier investors and recognized as one of the 100 most promising AI companies in the world.We're building a groundbreaking AI engine for games, enabling developers to create dynamic NPCs and worlds that evolve with each action.Our Technical Operations TeamWe're looking for a Staff...


  • Mountain View, California, United States Computech Corporation Full time

    About the RoleWe are seeking a highly motivated and experienced Principal Engineer/Researcher to join our team at Computech Corporation. As a key member of our Infrastructure and Data Platform Department, you will play a crucial role in developing next-generation AI and data platforms capable of handling millions of vehicles and providing Mobility as a...


  • Mountain View, California, United States Inworld AI Full time

    About Inworld AIInworld AI is a leading startup in AI and games, backed by top-tier investors and recognized as one of the 100 most promising AI companies in the world.Job DescriptionWe are seeking a Staff Cloud DevOps/Site Reliability Engineer to join our Technical Operations team, which manages the infrastructure, DevOps, and Site Reliability of our...


  • Mountain View, California, United States Moveworks Full time

    We are seeking an AI Systems Infrastructure Specialist to contribute to the development of advanced machine learning infrastructure at Moveworks. This position is essential for constructing, refining, and scaling comprehensive machine learning systems. The ML infrastructure team undertakes a diverse range of tasks, including distributed training and...

  • Senior Data Scientist

    3 weeks ago


    Mountain View, California, United States AVA Counsulting Full time

    About the RoleWe are seeking a highly skilled and experienced Principal Engineer/Researcher to join our team at AVA Consulting. The successful candidate will be responsible for conducting research and development in AI and data platforms, with a focus on data and AI infrastructure.Key ResponsibilitiesDefine research goals and tasks for AI and/or data...