Software Engineer

4 weeks ago


Pittsburgh, United States META Full time

Reality Labs Research (RL-R) brings together a diverse and highly interdisciplinary team of researchers and engineers to create the future of augmented and virtual reality. On the Codec Avatars ML Compute team, you'll work on building tools, libraries, and frameworks that will help researchers collaborate with each other and empower their research towards the generation of Codec Avatars. Our team cultivates an honest and considerate environment where self-motivated individuals thrive. We encourage a strong sense of ownership and embrace the ambiguity that comes with working on the frontiers of research. In this software engineer role on the Codec Avatar ML Compute team, you will serve as the point of contact for Meta's research GPU super clusters, managing and optimizing compute resources to enable groundbreaking research in relightable avatars, full-body avatars, and generative AI for codec avatars.

Software Engineer - Codec Avatar ML Compute Team Responsibilities

  • Build, scale, and secure the HPC clusters within Meta research labs, a heterogeneous environment containing diverse operating systems and applications
  • Provide on-call support and lead incident root cause analysis through multiple infrastructure layers (compute, storage, network) for HPC clusters and act as a final escalation point
  • Collaborate in a diverse team environment across multiple scientific and engineering disciplines, making the architectural tradeoffs required to rapidly deliver software and infrastructure solutions
  • Find ways to leverage the scale and complexity of the larger Meta production infrastructure to solve problems for Reality Lab researchers
  • Provide guidance to other engineers on best practices to build mature services which are highly available, reliable, secure, and scalable
  • Provide guidance to other engineers on best practices to build mature services which are highly available, reliable, secure, and scalable
  • Ability to work independently, handle large projects simultaneously, and prioritize team roadmap and deliverables by balancing required effort with resulting impact
Minimum Qualifications
  • Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
  • Experience in automating the management of infrastructure and services
  • 3+ years experience in distributed system performance measurement, logging, and optimization
  • 3+ years experience coding in at least one of the following languages: C++, Python, Rust, or Go
  • Thorough understanding of Linux operating system internals, including the networking subsystem
  • Experience with Python library management systems such as Conda or Python venv
  • Experience in writing system level infrastructure, libraries, and applications
  • Experience with software development practices such as source control, code reviews, unit testing, debugging and profiling
  • Proven track record of shipping software
  • Experience in developing performant software and systems
Preferred Qualifications
  • Experience with managing HPC scheduler libraries like Slurm, Kubernetes, or LSF
  • Prior experience in building out HPC clusters, handling compute, storage, network, operating systems, schedulers, and stakeholder discussions
  • Prior experience in cluster oncall operations, including troubleshooting server/scheduler/storage errors, maintaining compute/storage environments/libraries/tools, helping onboard users to the cluster, and answering general questions from users
  • Prior experience in cluster coordination and strategy planning, including collecting/understanding needs of users, developing tools to improve user experience, providing guidance on best practices, coordinating distribution of compute/storage resources, forecasting compute/storage needs, and developing long-term user experience/compute/storage strategies
  • Prior experience building tooling for monitoring and telemetry
  • Prior experience supporting configuration management in a multi-region environment
  • Prior experience optimizing multi-tenant HPC clusters for performance and maintenance
  • Prior experience with containerization technologies like Docker or Virtual Machines
  • Prior experience building services
  • Prior experience building PaaS or internal clouds
  • Prior experience in developing/managing distributed network file systems
  • Prior academic or development experience with machine learning and/or deep learning
  • Prior experience in ML libraries such as PyTorch, TensorFlow or cuDNN
  • Prior experience in GPGPU development with CUDA, OpenCL or DirectCompute
  • Prior experience in network security
  • Experience in database and data management systems at scale
  • Familiar with Linux observability tools, such as eBPF


Start preparing
Learn about how to prepare for your interview with our interview guide, tips, and interactive experiences.
Visit interview prep
  • Software Engineer

    6 days ago


    Pittsburgh, United States Software Engineering Institute Full time

    The CERT division of the Software Engineering Institute (SEI), a federally funded research and development center at Carnegie Mellon University in Pittsburgh, Pennsylvania, engages in state-of-the-art research and development and provides robust solutions focused on ensuring that software engineers, cybersecurity experts, network and system administrators,...


  • Pittsburgh, United States Software Engineering Institute Full time

    About the roleAre you an engineer who enjoys a challenge? Are you excited about working for an FFRDC focused on areas critical to national security? Do you want to join a collaborative team that develops and uses best-in-class tools to enable end-to-end software development? If so, we want you for our team, where you'll be part of an exciting and impactful...


  • Pittsburgh, United States Software Engineering Institute Full time

    About the role Are you an engineer who enjoys a challenge? Are you excited about working for an FFRDC focused on areas critical to national security? Do you want to join a collaborative team that develops and uses best-in-class tools to enable end-to-end software development? If so, we want you for our team, where you'll be part of an exciting and impactful...


  • Pittsburgh, United States Software Engineering Institute Full time

    About the role Are you an engineer who enjoys a challenge? Are you excited about working for an FFRDC focused on areas critical to national security? Do you want to join a collaborative team that develops and uses best-in-class tools to enable end-to-end software development? If so, we want you for our team, where you'll be part of an exciting and impactful...


  • Pittsburgh, United States Software Engineering Institute Full time

    About the role Are you an engineer who enjoys a challenge? Are you excited about working for an FFRDC focused on areas critical to national security? Do you want to join a collaborative team that develops and uses best-in-class tools to enable end-to-end software development? If so, we want you for our team, where you'll be part of an exciting and impactful...


  • Pittsburgh, United States Software Engineering Institute Full time

    About the role Are you an engineer who enjoys a challenge? Are you excited about working for an FFRDC focused on areas critical to national security? Do you want to join a collaborative team that develops and uses best-in-class tools to enable end-to-end software development? If so, we want you for our team, where you'll be part of an exciting and impactful...


  • Pittsburgh, United States Software Engineering Institute Full time

    What We Do The Software Engineering Institute (SEI) at Carnegie Mellon University helps advance software engineering principles and practices and serves as a national resource in software engineering, computer security, and process improvement. The SEI works closely with defense and government organizations, industry, and academia to continually improve...


  • Pittsburgh, United States Software Engineering Institute Full time

    What We Do The Software Engineering Institute (SEI) at Carnegie Mellon University helps advance software engineering principles and practices and serves as a national resource in software engineering, computer security, and process improvement. The SEI works closely with defense and government organizations, industry, and academia to continually improve...

  • Software Engineer

    6 days ago


    Pittsburgh, United States Coupa Software Full time

    Coupa makes companies operate smarter and grow faster. Our leading AI-driven platform connects and optimizes sourcing, purchasing, supply chains, and financial management. More than 3,000 global organizations large and small trust Coupa to transform operating margins, increase efficiencies and growth, optimize cash, and reduce risk. Responsibilities: ...

  • Software Engineer

    2 months ago


    Pittsburgh, United States Thermo Fisher Scientific Full time

    Job DescriptionDUTIES:• Develop and deploy continuously delivering web applications, software features and enhancements with a focus on quality, security and performance.• Design, develop and deploy REST based Microservices.• Design, develop, unit test, debug and maintain java-based web applications in accordance with functional requirements and scope...

  • Software Engineer

    1 month ago


    Pittsburgh, Pennsylvania, United States Thermo Fisher Scientific Full time

    Job DescriptionDUTIES: Develop and deploy continuously delivering web applications, software features and enhancements with a focus on quality, security and performance. Design, develop and deploy REST based Microservices. Design, develop, unit test, debug and maintain java-based web applications in accordance with functional requirements and scope of work...

  • Software Engineer

    1 month ago


    Pittsburgh, United States Thermo Fisher Scientific Full time

    DUTIES: • Develop and deploy continuously delivering web applications, software features and enhancements with a focus on quality, security and performance. • Design, develop and deploy REST based Microservices. • Design, develop, unit test, debug and maintain java-based web applications in accordance with functional requirements and scope of work...

  • Software Engineer

    4 hours ago


    Pittsburgh, United States eNGINE Full time

    eNGINE builds Technical Teams. We are a Solutions and Placement firm shaped by decades of interaction with Technical professionals. Our inspiration is continuous learning and engagement with the markets we serve, the talent we represent, and the teams we build. Our Consulting Workforce is encouraged to enjoy career fulfillment in the form of challenging...

  • Software Engineer

    6 days ago


    Pittsburgh, United States Cyient Full time

    Cyient is one of the world's leading rail engineering solutions partner repeatedly trusted by rail majors to address complex engineering challenges across the design-build-maintain life cycle. Our Design solutions include rolling stock project and product engineering support and signaling application engineering. Our Build offering includes product...

  • Software Engineer

    6 days ago


    Pittsburgh, United States Govini Full time

    Company Description Govini builds software to accelerate the Defense Acquisition Process. Ark, Govini’s flagship product, is a suite of AI-enabled applications, powered by integrated government and commercial data, to accelerate the entire spectrum of Defense Acquisition, including Supply Chain, Science & Technology, Production, Sustainment, and...

  • Software Engineer

    1 month ago


    Pittsburgh, United States Lovelace AI Full time

    Job DescriptionJob DescriptionAbout Us:Lovelace AI was born from the desire to apply state of the art AI and systems engineering to the question of human safety, especially in dangerous conditions such as conflict, disaster response, anti-terrorism and deterrence against AIs designed by adversaries to harm civilians.How many lives can be saved by taming the...

  • Software Engineer

    2 weeks ago


    Pittsburgh, United States System One Full time

    Job Title: Technology Engineer Job Location: Pittsburgh, PA, 3 days onsite Duration: 6 months with possible extension Job Summary: Responsible for writing programs to maintain and control computer systems software for operating systems, networked systems, and database systems. Responsible for creating the software platform, and then fine-tuning the final...

  • Software Engineer

    2 weeks ago


    Pittsburgh, United States System One Holdings, LLC Full time

    Job Title: Technology Engineer Job Location: Pittsburgh, PA, 3 days onsite Duration: 6 months with possible extension Job Summary: Responsible for writing programs to maintain and control computer systems software for operating systems, networked systems, and database systems. Responsible for creating the software platform, and then fine-tuning the final...


  • Pittsburgh, United States Carnegie Mellon University Full time

    Reference #: 2021467 About the role Are you an engineer who enjoys a challenge? Are you excited about working for an FFRDC focused on areas critical to national security? Do you want to join a collaborative team that develops and uses best-in-class tools to enable end-to-end software development? If so, we want you for our team, where you'll be part of an...


  • Pittsburgh, United States University of Pittsburgh Full time

    Under the direction of management and more senior members of the team, the Software Engineer - Senior will contribute to the overall Software Development Life Cycle (SDLC) by independently completing projects and tasks by meeting established quality Software Engineer, Software, Engineer, Senior, Project Management, Technology, Education, Development