Current jobs related to Infrastructure Reliability Engineer - San Francisco, California - BaseTen Labs, Inc.


  • San Francisco, California, United States Gusto Full time

    About GustoGusto is a leading provider of modern, online people platforms that empower small businesses to take care of their teams. Our comprehensive suite of solutions includes full-service payroll, health insurance, 401(k)s, expert HR, and team management tools. With offices in Denver, San Francisco, and New York, we serve over 300,000 businesses...


  • San Francisco, California, United States Joinslash Full time

    About SlashSlash is the premier banking platform for small businesses, empowering entrepreneurs to manage their finances effectively and focus on their passions.We are looking for a skilled Infrastructure Engineer to join our team and contribute to building reliable systems that power our platform.Key Responsibilities:Design and implement scalable...


  • San Francisco, California, United States MESH Full time

    About MeshMESH is a pioneering fintech company that aims to build an open, connected, and secure financial ecosystem. As a modern financial operating system, MESH enables deposits from exchanges and wallets, digital asset transfers, crypto payments, account aggregation, and trading within a unified platform.With over 300 integrations, MESH is at the...


  • San Francisco, California, United States Genmo Full time

    Job DescriptionWe are Genmo, a research lab dedicated to building open, state-of-the-art models for video generation towards unlocking the right brain of AGI.As a Site Reliability Engineer at Genmo, you will be responsible for designing, implementing, and maintaining the infrastructure that powers our large generative AI models. You will work on...


  • San Francisco, California, United States Arbitrum Inc Full time

    Reliability EngineerAt Arbitrum Inc, we're on a mission to bring blockchain to a billion people. Our developer platform is designed to make building on the blockchain easy, and we're looking for a skilled Reliability Engineer to join our Infrastructure team.As a Reliability Engineer, you'll collaborate with our engineering team to design, deploy, and...


  • San Francisco, California, United States AEG Full time

    About the RoleWe are seeking an experienced Site Reliability Engineer to join our DevSecOps and Infrastructure team. The successful candidate will be responsible for supporting our enterprise infrastructure, optimizing incident response, and working with technical teams to improve overall workload resiliency.ResponsibilitiesSupport production systems and...


  • San Francisco, California, United States Alchemy Full time

    About the RoleAlchemy is seeking a highly skilled Infrastructure Reliability Specialist to join our team. As a key member of our Infrastructure department, you will play a critical role in designing, deploying, and continuously improving the infrastructure supporting our globally used developer platform.Your focus will be on enhancing developer productivity...


  • San Francisco, California, United States Circle Full time

    About the RoleCircle is a financial technology company at the forefront of the emerging internet of money, where value can flow freely and securely across borders. As a Senior Site Reliability Engineer, you will play a critical role in designing, building, and maintaining Circle's cloud infrastructure to meet the growing needs of our worldwide customer...


  • San Francisco, California, United States BaseTen Labs, Inc. Full time

    About BaseTen Labs, Inc.We're a rapidly growing team of innovators backed by top-tier investors, including IVP, Spark Capital, and Sarah Guo at Conviction. Our mission is to empower machine learning teams at enterprises and AI-native companies to build scalable, reliable, and efficient infrastructure.Job DescriptionWe're seeking a skilled Site Reliability...


  • San Francisco, California, United States Withorb Full time

    About UsOrb is a cutting-edge technology company on a mission to revolutionize the way businesses approach revenue growth. Our team is passionate about building a robust infrastructure that enables our customers to unlock their full potential.Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our...


  • San Francisco, California, United States Unreal Gigs Full time

    Job Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...


  • San Francisco, California, United States Joinslash Full time

    About SlashSlash is the leading banking platform for small businesses, empowering entrepreneurs to manage their finances effectively and focus on their passions.The RoleWe're seeking an experienced Infrastructure Engineer to join our team, responsible for maintaining the reliability and scalability of our production environment. As a key member of our...


  • San Francisco, California, United States Unreal Gigs Full time

    Job Title: Site Reliability EngineerAt Unreal Gigs, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the high availability, scalability, and performance of our complex distributed systems.Key Responsibilities:Design and implement monitoring, logging, and alerting...


  • San Francisco, California, United States Wasmer Full time

    About the RoleWe are seeking an exceptional Site Reliability Engineer to join our team at Wasmer. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and performance of our Edge computing platform.Key ResponsibilitiesDesign, implement, and maintain scalable and reliable infrastructure solutions for our Edge computing...


  • San Francisco, California, United States Instabase Full time

    About InstabaseInstabase is a global company with offices in San Francisco, New York, London, and Bengaluru. We're a people-first organization that values experimentation, curiosity, and customer obsession.Job SummaryWe're seeking a Site Reliability Engineer to join our Site Reliability and Platform Engineering team. As a key member of our team, you'll be...


  • San Francisco, California, United States Joinslash Full time

    About SlashSlash is a leading banking platform for small businesses, empowering entrepreneurs to manage their finances effectively and focus on their passions.The RoleWe're seeking an Infrastructure Engineer to join our team, responsible for designing and implementing reliable systems that support our rapid growth.Key Responsibilities:Ensure the health and...


  • San Francisco, California, United States Foundry Full time

    About FoundryFoundry is a cutting-edge technology company that specializes in developing AI-powered infrastructure. As a Site Reliability Engineer at Foundry, you will play a critical role in ensuring the reliability, performance, and scalability of our infrastructure.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team. The...


  • San Francisco, California, United States HashiCorp Full time

    About the RoleWe are seeking a skilled Cloud Infrastructure Engineer to join our Terraform Enterprise team at HashiCorp. As a key member of our team, you will be responsible for designing, implementing, and maintaining our cloud infrastructure, ensuring seamless user experiences for our customers.Key ResponsibilitiesDesign and implement scalable cloud...


  • San Francisco, California, United States SpeedCast Full time

    Job Title: Site Reliability EngineerAt Speedcast, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based communication solutions.Key Responsibilities:Analyze and design continuous...


  • San Diego, California, United States Qualcomm Full time

    Job Summary:Qualcomm is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the stability, security, and performance of our software systems and infrastructure.Key Responsibilities:Monitor system health and detect anomalies to prevent service disruptions.Investigate and...

Infrastructure Reliability Engineer

2 months ago


San Francisco, California, United States BaseTen Labs, Inc. Full time

ABOUT BASETEN LABS, INC.

We are an innovative team of creators supported by leading investors such as IVP, Spark Capital, and Sarah Guo at Conviction. Machine Learning teams at major enterprises and pioneering AI-native organizations utilize Baseten to enhance their core production workloads with top-tier performance, security, and dependability. Having achieved product-market fit and secured Series B funding, we are poised to make significant strides in the expansive ML infrastructure market. If you are eager to tackle engaging and pertinent challenges while contributing to the development of groundbreaking solutions, we invite you to explore this opportunity.

As a Site Reliability Engineer, your role will involve conceptualizing and constructing resilient systems and processes that guarantee our infrastructure remains scalable, dependable, and efficient. This encompasses automating deployments, monitoring systems, optimizing performance, and managing incidents.

Our team collaborates closely with users, gaining insights from their previous challenges in operationalizing ML, guiding them through our platform, and transforming our experiences into actionable ideas for enhancing Baseten.

KEY RESPONSIBILITIES:

  • Develop and sustain scalable infrastructure solutions.
  • Possess extensive expertise in Kubernetes.
  • Recognize the value of automation and implement it where appropriate, such as in managing CI/CD pipelines.
  • Establish standards and best practices for reliability and performance.
  • While prior ML experience is not mandatory, a willingness to learn about it is essential.
  • Bonus Qualifications: Relevant open-source observability experience (Prometheus, ELK stack, Grafana stack, OpenTelemetry).

ADDITIONAL QUALIFICATIONS:

  • Capable of managing products and projects from inception to completion; our engineers and designers also act as project managers, so we value team members who can empathize with users, comprehend and draft project specifications, and oversee the full execution of projects.
  • Comfortable navigating uncertainty and appreciating the process as much as the outcome.
  • Driven by customer challenges and take pleasure in crafting simple, elegant solutions that minimize unnecessary complexity.
  • Exercise sound judgment regarding trade-offs and the tools required to address problems, avoiding an overemphasis on trendy technologies unless they are the right fit.
  • Exhibit pride, ownership, and accountability for your work, expecting the same from colleagues.

TECHNOLOGY STACK:

Backend — Go, Python, Postgres

Platform — Kubernetes, Go, Postgres, Redis, Kafka

Infrastructure — GitOps, Flux, Terraform, AWS/GCP

WHAT WE PROVIDE:

  • Competitive compensation package (Unlimited PTO, 401k, covered healthcare premiums).
  • A unique chance to be part of a rapidly expanding startup in one of the most thrilling engineering domains of our time.
  • An inclusive and supportive work environment that promotes learning and growth.
  • Exposure to a variety of ML startups, providing unparalleled learning and networking opportunities.

We are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.