Senior Infrastructure Software Engineer

1 week ago


San Francisco, California, United States Anthropic Limited Full time

Position Overview:

Anthropic Limited is on the lookout for skilled and seasoned Infrastructure Engineers to enhance our capabilities in developing, scaling, and maintaining innovative AI systems. As part of our Infrastructure division, you will engage with pioneering AI technologies and play a vital role in advancing our mission to establish safe and dependable AI systems that serve humanity.

Current Opportunities:
  • Data Infrastructure: The Data Infrastructure team focuses on the architecture, construction, and upkeep of the data frameworks that fuel our AI research and products. You will work alongside diverse teams to grasp data needs, deliver effective and trustworthy data solutions, and consistently refine our data infrastructure. Your responsibilities will include constructing and fine-tuning data pipelines, applying data governance best practices, monitoring and troubleshooting, and formulating technical strategies for high-scale, dependable data systems and pipelines. You will utilize technologies such as Spark, Airflow, dbt, and cloud services from GCP and AWS, while establishing processes to ensure effective team operations and ongoing enhancement.

  • Research Infrastructure: The research infrastructure team tackles the challenge of developing and scaling systems that allow researchers to iterate swiftly and scale essential systems/components used during the development phase to operate at production scale as our model footprint expands.

  • Site Reliability Engineering: As an SRE at Anthropic, you will design and implement scalable solutions, collaborate with development teams to enhance infrastructure reliability, and set up monitoring systems, SLOs, and SLIs. You will employ fault-tolerant design patterns, create automation tools, and participate in an on-call rotation. By applying Infrastructure as Code principles, you will work with cross-functional teams to ensure reliability and scalability in new features and services, and expedite engineering reliability through superior tooling.

  • Systems: The systems team is tasked with supporting some of the largest and most advanced clusters in the industry used for training, researching, and ultimately serving AI models. Your contributions will be essential in ensuring Anthropic can reliably and safely train frontier models. You will be responsible for constructing systems and managing extensive Kubernetes clusters with GPU/TPU/Tranium workloads.
  • Observability: The observability team is dedicated to designing, building, and maintaining the observability infrastructure that guarantees the reliability, performance, and efficiency of our AI systems and services. You will collaborate with various teams to comprehend their observability needs and deliver solutions utilizing technologies such as Prometheus, Splunk, Cloud Logging, Grafana, and Honeycomb. Your role will involve developing a configuration-driven approach to manage dashboards and alerts, implementing structured logging and tracing, optimizing the observability stack, and constructing a reliable system that demands minimal maintenance. You will promote a culture of operational excellence, proactive monitoring, and continuous improvement by providing managed, centralized, and user-friendly observability tools.

Key Responsibilities:
  • Lead the development of industry-leading AI clusters (ranging from thousands to hundreds of thousands of machines), collaborating closely with cloud service providers on cluster construction and necessary features.
  • Engage with various stakeholders to thoroughly understand infrastructure, data, and compute requirements, identifying potential solutions to support advanced research and product development.
  • Establish technical strategy and oversee the creation of high-scale, reliable infrastructure systems.
  • Mentor top technical talent within the organization.
  • Design processes (e.g., postmortem reviews, incident response, on-call rotations) that facilitate effective team operations and prevent recurring failures.
You might be a suitable candidate if you:
  • Possess 8+ years of relevant industry experience, with 3+ years leading large-scale, complex projects or teams as an engineer or technical lead.
  • Exhibit a strong passion for distributed systems at scale, infrastructure reliability, scalability, security, and continuous improvement.
  • Demonstrate strong proficiency in at least one programming language (e.g., Python, Rust, Go, Java).
  • Have excellent problem-solving abilities and the capacity to work independently.
  • Show a commitment to supporting internal partners, such as research teams, to understand their needs.
  • Possess outstanding communication skills to build consensus with stakeholders, both internally and externally.
  • Have in-depth knowledge of modern cloud infrastructure, including Kubernetes, Infrastructure as Code, AWS, and GCP.
Preferred Qualifications:
  • Expertise in security and privacy best practices.
  • Experience with machine learning infrastructure, including GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL.
  • Familiarity with low-level systems, such as Linux kernel tuning and eBPF.
  • Technical expertise in quickly understanding systems design trade-offs and keeping track of rapidly evolving software systems.

Application Deadline: None. Applications will be reviewed on a rolling basis.



  • San Francisco, California, United States Rippling Full time

    Senior Staff Software Engineer - Infrastructure LeadAbout RipplingRippling is a leading provider of cloud-based human capital management (HCM) solutions, offering a comprehensive platform for businesses to manage their workforce, payroll, benefits, and other HR-related tasks. Our mission is to empower organizations to streamline their operations, improve...


  • San Francisco, California, United States Forward Full time

    About ForwardForward is a pioneering healthcare company on a mission to make high-quality healthcare accessible to a billion people worldwide. We're building a cutting-edge healthcare platform from the ground up, integrating hardware, software, and medical expertise under one roof. Our goal is to revolutionize the healthcare industry by creating a seamless...


  • San Francisco, California, United States Baton Trucking, Inc. Full time

    Job SummaryBaton Trucking, Inc. is seeking a highly skilled Senior Software Engineer - Testing Infrastructure to join our dynamic team. This role focuses on creating and maintaining a robust testing infrastructure that enhances engineering productivity and sets a high standard for code quality.Key ResponsibilitiesLead and Collaborate: Work with...


  • San Francisco, California, United States Seesaw Full time

    Position Overview:Seesaw is in search of a skilled back-end Software Engineer with a focus on infrastructure to become a vital member of our Core Platform Engineering division. In this capacity, you will significantly influence the underlying architecture of our platform, building the essential layers that empower our various product teams to efficiently...


  • San Francisco, California, United States Seesaw Full time

    Position Overview:Seesaw is on the lookout for a skilled back-end Software Engineer with a focus on infrastructure to enhance our Core Platform Engineering team. In this role, you will be instrumental in establishing the essential framework of our platform, enabling various product teams to efficiently deliver outstanding user experiences at scale. Your...


  • San Francisco, California, United States Seesaw Full time

    Position Overview:Seesaw is in search of a skilled back-end Software Engineer with a focus on infrastructure to enhance our Core Platform Engineering division. In this role, you will be instrumental in establishing the core architecture of our platform, enabling various product teams to efficiently deliver outstanding user experiences at scale. Your duties...


  • San Francisco, California, United States Succinct Full time

    About the RoleWe are seeking a highly skilled Senior Software Engineer to join our team at Succinct, a leading innovator in blockchain scaling and interoperability solutions. As a key member of our infrastructure team, you will play a critical role in designing and maintaining a highly available and scalable distributed system for our SP1 proving cluster and...


  • San Francisco, California, United States Anthropic Limited Full time

    Position Overview:Anthropic Limited is in search of skilled and seasoned Infrastructure Engineers to enhance our capabilities in developing, scaling, and maintaining innovative AI systems. By becoming part of our Infrastructure division, you will engage with pioneering AI technologies and play a significant role in advancing frontier models, furthering...


  • San Francisco, California, United States Anthropic Limited Full time

    Position Overview:Anthropic Limited is on the lookout for skilled and seasoned Infrastructure Engineers to enhance our efforts in developing, scaling, and maintaining innovative AI systems. This role presents an exciting opportunity to engage with advanced AI technologies and contribute to the evolution of state-of-the-art models, aligning with Anthropic's...


  • San Francisco, California, United States Anthropic Limited Full time

    Position Overview:Anthropic Limited is on the lookout for skilled and seasoned Infrastructure Engineers to enhance our efforts in the development, scaling, and upkeep of our advanced AI systems. As part of the Infrastructure team, you will engage with pioneering AI technologies and play a vital role in advancing frontier models, aligning with Anthropic's...


  • San Francisco, California, United States Anthropic Limited Full time

    Position Overview:Anthropic Limited is in search of skilled and seasoned Infrastructure Engineers to enhance our capabilities in developing, scaling, and maintaining advanced AI systems. By becoming a part of our Infrastructure division, you will engage with pioneering AI technologies and play a vital role in advancing frontier models, aligning with...


  • San Francisco, California, United States Acceler8 Talent Full time

    About the RoleWe are seeking a highly skilled Senior Software Engineer to join our team at Acceler8 Talent. As a key member of our engineering team, you will be responsible for designing, developing, and deploying large-scale software systems using a modern tech stack.Key Responsibilities:Orchestrate cloud infrastructure to support data and machine learning...


  • San Francisco, California, United States Sentry Full time

    About the RoleSentry is seeking a highly skilled Senior Software Engineer, Developer Infrastructure to join our Developer Productivity Team. As a key member of this team, you will be responsible for delivering a seamless developer experience, ensuring that every engineer can deliver their best work quickly and efficiently.The ideal candidate will have a...


  • San Ramon, California, United States Dew Software Full time

    Job OverviewDew Software, a distinguished leader in the Digital Transformation sector, is in search of a talented Infrastructure Engineer to enhance their operations. With a steadfast dedication to quality and excellence, Dew Software partners with Fortune 500 organizations, aiding them in their digital evolution.As an Infrastructure Engineer, you will be...


  • San Jose, California, United States Advanced Micro Devices Full time

    About the RoleWe are seeking a highly skilled Senior AI Infrastructure Software Engineer to join our team at Advanced Micro Devices (AMD). As a key member of our infrastructure team, you will play a critical role in the development and release of our inference engine, which will enable our customers to leverage high-performance AI models on top of AMD's...


  • San Ramon, California, United States Dew Software Full time

    Job OverviewDew Software, a distinguished leader in the Digital Transformation sector, is in search of a talented Infrastructure Engineer to enhance their workforce. With a steadfast dedication to quality and excellence, Dew Software partners with Fortune 500 companies, aiding them in their digital transformation initiatives.As an Infrastructure Engineer,...


  • San Ramon, California, United States Dew Software Full time

    Job OverviewDew Software, a distinguished leader in the Digital Transformation arena, is on the lookout for a talented Infrastructure Engineer to enhance their workforce. With a steadfast dedication to quality and excellence, Dew Software partners with Fortune 500 companies, aiding them in their digital transformation endeavors.As an Infrastructure Engineer,...


  • San Francisco, California, United States Square Inc. Full time

    Senior Software Engineer, Edge - Product Platform EngineeringTeam Overview:Our team prioritizes accuracy, efficiency, and security in all our endeavors. We believe in thorough measurement and monitoring, fostering a culture of ongoing reflection and enhancement. Our goal is to minimize production friction, ensuring that no initiative is stalled due to...


  • San Francisco, California, United States Square Inc. Full time

    Senior Software Engineer, Edge - Product Platform EngineeringTeam Overview:Our team is dedicated to upholding principles of accuracy, efficiency, and security. We prioritize measurement and monitoring in our processes, fostering a culture of ongoing evaluation and enhancement. Our goal is to minimize obstacles in production, ensuring that no initiative is...


  • San Francisco, California, United States Tbwa ChiatDay Inc Full time

    About the RoleWe are seeking a highly skilled Backend Software Engineer to join our Infrastructure team at Tbwa Chiat/Day Inc. As a Backend Software Engineer, you will be responsible for designing, developing, and maintaining the backend infrastructure of our applications.Key ResponsibilitiesDesign and develop scalable, efficient, and reliable backend...