Current jobs related to Senior Infrastructure Software Engineer - San Francisco, California - Anthropic Limited


  • San Francisco, California, United States Baton (A Ryder Technology Lab) Full time

    Job Title: Senior Software Engineer - InfrastructureBaton, a technology innovation lab for Ryder, is seeking a highly skilled Senior Software Engineer to join our Infrastructure team. As a key member of our team, you will play a crucial role in designing and implementing robust testing infrastructure that enhances engineering productivity and sets a high...


  • San Francisco, California, United States Baton (A Ryder Technology Lab) Full time

    Job Title: Senior Software Engineer - InfrastructureWe are seeking a highly skilled Senior Software Engineer to join our Infrastructure team at Baton, a technology innovation lab for Ryder. As a key member of our team, you will be responsible for designing and developing our core web infrastructure, ensuring scalability, reliability, and security.Key...


  • San Francisco, California, United States Baton Full time

    Job DescriptionBaton is a technology innovation lab for Ryder, a leading logistics company. We're seeking a Senior Software Engineer to join our Infrastructure team.This role involves creating robust testing infrastructure to enhance engineering productivity and set a high standard for code quality.You'll work with our Head of Engineering to enable our...


  • San Francisco, California, United States Pomelo Full time

    About the RolePomelo is a financial technology platform that combines consumer credit and global remittances. We're looking for a skilled Senior Software Engineer, Infrastructure to join our team in San Francisco. As a vital member of our Infrastructure team, you'll play a key role in building and maintaining the core systems that keep our platform reliable,...


  • San Francisco, California, United States Acceler8 Talent Full time

    About the RoleWe are seeking a Senior Software Engineer (AI Infrastructure / MLOps) to join our pioneering AI startup focused on enhancing data quality for machine learning. This role offers the chance to work on large-scale web applications and tackle complex challenges in a rapidly growing field.As a Senior Software Engineer (AI Infrastructure / MLOps),...


  • San Francisco, California, United States Informal Systems Full time

    Job OverviewInformal Systems is a pioneering company in the field of blockchain technology, specializing in the security of interoperable, fault-tolerant networks. We are seeking a highly skilled Senior Software Engineer to join our team as a Blockchain Infrastructure Engineer, focusing on our core staking operations and service offerings on Ethereum,...


  • San Francisco, California, United States Acceler8 Talent Full time

    About the RoleWe are seeking a highly skilled Senior Software Engineer to join our team as an AI Infrastructure Specialist. This role offers the opportunity to work on large-scale web applications and tackle complex challenges in a rapidly growing field.As a Senior Software Engineer, you will be responsible for developing and maintaining our flagship web...


  • San Francisco, California, United States Deepscribe Full time

    About the RoleWe are seeking a Senior Software Engineer to join our ML Infrastructure team at DeepScribe. As a key member of our team, you will be responsible for building and optimizing infrastructure for audio processing, transcription, and LLM orchestration, ensuring scalability, reliability, and performance.You will collaborate with product and AI...


  • San Francisco, California, United States Rippling Full time

    Senior Staff Software Engineer - Infrastructure LeadAbout RipplingRippling is a leading provider of cloud-based human capital management (HCM) solutions, offering a comprehensive platform for businesses to manage their workforce, payroll, benefits, and other HR-related tasks. Our mission is to empower organizations to streamline their operations, improve...


  • San Francisco, California, United States Parafin Inc Full time

    About Us:At Parafin, we're dedicated to empowering small businesses to grow and thrive. Our mission is to provide innovative financial services that make a real difference in the lives of entrepreneurs and their communities.We're a team of passionate individuals who share a common goal: to build a platform that enables small businesses to access the...


  • San Francisco, California, United States Acceler8 Talent Full time

    About the RoleWe are seeking a highly skilled Senior Software Engineer to join our pioneering AI startup, specializing in enhancing data quality for machine learning. This role offers the opportunity to work on large-scale web applications and tackle complex challenges in a rapidly growing field.As a Senior Software Engineer, you will be responsible for...


  • San Francisco, California, United States HashiCorp Full time

    About UsHashiCorp is a leading provider of cloud infrastructure management solutions. Our team is dedicated to delivering innovative products that enable organizations to manage their cloud, private datacenter, and SaaS infrastructure with ease.About the RoleWe are seeking a highly skilled Senior Engineer to join our Terraform Enterprise team. As a key...


  • San Francisco, California, United States Triunity Software Full time

    Job Title: Senior Java Software EngineerWe are seeking a highly skilled Senior Java Software Engineer to join our team at Triunity Software.Key Responsibilities:* Design, develop, and test complex software applications using Java* Collaborate with cross-functional teams to identify and prioritize project requirements* Develop and maintain high-quality,...


  • San Francisco, California, United States Slapdash Full time

    Join Our Team as a Software Engineer - InfrastructureWe are seeking an experienced Software Engineer - Infrastructure to join our team at Slapdash. As a key member of our engineering team, you will be responsible for building and maintaining our cloud infrastructure, ensuring it is scalable, secure, and optimized for performance.About the RoleDesign and...


  • San Francisco, California, United States Succinct Full time

    About the RoleWe are seeking a highly skilled Senior Software Engineer to join our team at Succinct. As a key member of our infrastructure team, you will be responsible for designing and maintaining a highly available and scalable distributed system for our SP1 and prover network.Key Responsibilities:Architect and maintain a distributed system for...


  • San Francisco, California, United States Crusoe Full time

    About the Role:We are seeking a Senior/Staff Software Engineer to join our team at Crusoe Energy, a company on a mission to unlock value in stranded energy resources through the power of computation.As a key member of our engineering team, you will design and develop internal admin tooling and infrastructure management systems for Crusoe Cloud, a leading...


  • San Francisco, California, United States Crusoe Full time

    About the RoleAs a Senior/Staff Software Engineer on the Managed AI team at Crusoe, you'll have a pivotal role in shaping the architecture and scalability of our next-generation AI inference platform.You will lead the design and implementation of core systems for our AI services, including resilient fault-tolerant queues, model catalogs, and scheduling...


  • San Francisco, California, United States Parafin Full time

    About the RoleWe're seeking an experienced software engineer to join our Infrastructure team and help build the foundation for our rapidly growing financial technology platform. The Infrastructure team at Parafin owns core platforms spanning across cloud infrastructure, developer experience, data infrastructure, and security.As a key member of our team,...


  • San Jose, California, United States TikTok Full time

    Job Title: Senior Software Engineer - Search InfrastructureWe are seeking a highly skilled Senior Software Engineer to join our Search Infrastructure team at TikTok. As a key member of our team, you will be responsible for designing, developing, and maintaining the search infrastructure that powers our platform.Responsibilities:Design and develop scalable...


  • San Francisco, California, United States Deepscribe Full time

    At DeepScribe, we're revolutionizing the way healthcare professionals work with AI. Our mission is to bring joy back to medicine by automating documentation, allowing clinicians to focus on providing care.We're seeking a Senior Cloud Software Engineer to join our ML Infrastructure team. This team is responsible for building the core platforms and...

Senior Infrastructure Software Engineer

2 months ago


San Francisco, California, United States Anthropic Limited Full time

Position Overview:

Anthropic Limited is on the lookout for skilled and seasoned Infrastructure Engineers to enhance our efforts in developing, scaling, and maintaining innovative AI systems. This role presents an exciting opportunity to engage with advanced AI technologies and contribute to the evolution of state-of-the-art models, aligning with Anthropic's vision of fostering safe and dependable AI systems that serve humanity.

Current Opportunities Available:
  • Data Infrastructure: The Data Infrastructure team is tasked with architecting, constructing, and sustaining the data frameworks that drive our AI research and products. Collaborating with diverse teams, you will assess data needs, deliver robust and efficient data solutions, and perpetually refine our data infrastructure. Your responsibilities will include constructing and enhancing data pipelines, enforcing data governance best practices, monitoring and resolving issues, and establishing technical strategies for scalable, dependable data systems and pipelines. Familiarity with technologies such as Spark, Airflow, dbt, and cloud services from GCP and AWS will be essential, alongside designing processes that promote effective team operations and continuous enhancement.

  • Research Infrastructure: The research infrastructure team focuses on creating and scaling systems that empower researchers to iterate swiftly and ensure that key systems/components utilized during the development phase can operate at production scale as our model footprint expands.

  • Site Reliability Engineering: As a Site Reliability Engineer at Anthropic, you will devise and implement scalable solutions, partner with development teams to enhance infrastructure reliability, and set up monitoring systems, Service Level Objectives (SLOs), and Service Level Indicators (SLIs). You will adopt fault-tolerant design patterns, develop automation tools, and participate in an on-call rotation. By employing Infrastructure as Code (IaC) principles, you will collaborate with cross-functional teams to guarantee reliability and scalability in new features and services, thereby accelerating engineering reliability through superior tooling.

  • Systems: The systems team is accountable for managing some of the largest and most intricate clusters in the industry, utilized for training, researching, and ultimately serving AI models. Your contributions will be vital in ensuring Anthropic's capability to reliably and safely train advanced models. You will be responsible for constructing systems and operating extensive Kubernetes clusters with GPU/TPU/Tranium workloads.
  • Observability: The observability team is responsible for designing, building, and maintaining the observability infrastructure that guarantees the reliability, performance, and efficiency of our AI systems and services. You will work with various teams to comprehend their observability needs and deliver solutions utilizing technologies such as Prometheus, Splunk, Cloud Logging, Grafana, and Honeycomb. Your role will involve creating a configuration-driven approach to manage dashboards and alerts, implementing structured logging and tracing, optimizing the observability stack, and developing a reliable system that requires minimal maintenance. You will promote a culture of operational excellence, proactive monitoring, and continuous improvement by providing managed, centralized, and user-friendly observability tools.

Key Responsibilities:
  • Lead the development of industry-leading AI clusters (ranging from thousands to hundreds of thousands of machines), collaborating closely with cloud service providers on cluster development and necessary features.
  • Engage with various stakeholders to thoroughly understand infrastructure, data, and compute requirements, identifying potential solutions to support advanced research and product development.
  • Establish technical strategies and oversee the development of high-scale, reliable infrastructure systems.
  • Mentor top technical talent within the organization.
  • Design processes (e.g., postmortem reviews, incident response, on-call rotations) that enhance team effectiveness and prevent repeated failures.
You may be an ideal candidate if you possess:
  • 8+ years of relevant industry experience, with at least 3 years leading large-scale, complex projects or teams as an engineer or technical lead.
  • A strong passion for distributed systems at scale, infrastructure reliability, scalability, security, and continuous improvement.
  • Proficiency in at least one programming language (e.g., Python, Rust, Go, Java).
  • Excellent problem-solving skills and the ability to work independently.
  • A commitment to supporting internal partners, such as research teams, to understand their needs.
  • Outstanding communication skills to build consensus with stakeholders, both internally and externally.
  • In-depth knowledge of modern cloud infrastructure, including Kubernetes, Infrastructure as Code, AWS, and GCP.
Preferred Qualifications:
  • Expertise in security and privacy best practices.
  • Experience with machine learning infrastructure, including GPUs, TPUs, or Trainium, as well as supporting networking infrastructure like NCCL.
  • Low-level systems experience, such as Linux kernel tuning and eBPF.
  • Technical expertise in quickly understanding systems design trade-offs and keeping pace with rapidly evolving software systems.

Application Deadline: Applications will be reviewed on a rolling basis.