Site Reliability Engineer

3 weeks ago


New York, United States Baseten Full time
ABOUT BASETEN

We're a growing team of builders backed by top-tier investors, including IVP, Spark Capital, and Sarah Guo at Conviction. ML teams at enterprises and category-defining AI-native companies like Descript, Bland, and Patreon use Baseten to power their core production workloads with best-in-class performance, security, and reliability. While we've unlocked PMF and secured Series B funding, the ML infrastructure market is massive, and we're just getting started. If you're excited to work on engaging and relevant problems while building something new from the ground up, join us

THE ROLE

As a Site Reliability Engineer, you'll envision and build robust systems and processes that ensure our infrastructure is scalable, reliable, and efficient. This can range from automating deployments and monitoring systems to optimizing performance and managing incidents.

We all work closely with our users, learning from their past struggles in operationalizing ML, onboarding them onto our platform, and turning our learnings into ideas for improving Baseten.

RESPONSIBILITIES:
  • Build and maintain scalable infrastructure to support the deployment and operation of machine learning models.
  • Establish standards and best practices for reliability and performance across the infrastructure.
  • Automate processes when relevant, particularly for managing CI/CD pipelines.
  • Own products and projects end-to-end, functioning as both an engineer and a project manager, with a focus on user empathy, project specification, and end-to-end execution.
  • Collaborate with cross-functional teams to understand project requirements and translate them into technical solutions.
  • Mentor junior team members and contribute to knowledge sharing within the organization.
  • Navigate ambiguity and exercise good judgment on tradeoffs and tools needed to solve problems, avoiding unnecessary complexity.
  • Demonstrate pride, ownership, and accountability for your work, expecting the same from your teammates.
REQUIREMENTS:
  • Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.
  • 3+ years of work professional work experience in a fast-paced, high-growth environment.
  • Extensive experience with Kubernetes.
  • Experience in building and maintaining scalable infrastructure.
  • Experience with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi) and CI/CD tooling (e.g., GitHub Actions, GitLab CI, Circle CI, Jenkins).
  • Relevant OSS observability experience (Prometheus, ELK stack, Grafana stack, Opentelemetry) is a plus.
  • Ability to own projects end-to-end, from project specification to execution.
  • No prior machine learning experience required, but should be open to learning about it.
BENEFITS:
  • Competitive compensation package (Unlimited PTO, 401k, covered healthcare premiums).
  • A unique opportunity to be part of a rapidly growing startup in one of the most exciting engineering fields of our era.
  • An inclusive and supportive work culture that fosters learning and growth.
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.


Apply Now to embark on a rewarding journey in shaping the future of AI If you are a motivated individual with a passion for machine learning and a desire to be part of a collaborative and forward-thinking team, we would love to hear from you.

At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants without regard to race, color, religion, gender, sexual orientation, gender identity or expression, national origin, age, genetic information, disability, or veteran status.

  • New York, United States Automatic Data Processing Full time

    ADP is hiring a Site Reliability Engineer. Do you thrive in a challenging environment, love production systems, curious by nature with a thirst for pushing the limits? Are you inspired by transformation and making an impact on the lives of millions o Reliability Engineer, Liability, Reliability, Engineer, Reliability, Operations, Manufacturing


  • New York, United States Unreal Gigs Full time

    Job DescriptionJob DescriptionJob SummaryWe are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and...


  • New York, United States Unreal Gigs Full time

    Job DescriptionJob DescriptionJob SummaryWe are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and...


  • New York, United States Unreal Gigs Full time

    Job Summary We are in search of a Site Reliability Engineer to join our tech startup specializing in infrastructure and authorization solutions. As a Site Reliability Engineer, you'll be pivotal in ensuring the reliability, availability, and performance of our systems. Your role will involve designing, implementing, and maintaining scalable infrastructure...


  • New York, United States RedTech Recruitment Full time

    Site Reliability Engineer – Graduates consideredWe are excited to be able to offer this Site Reliability Engineer role working for an industry-leading software company. This company has won several awards and is pioneering in their machine learning technology. Founded 8 years ago, with a team of 150 brilliant engineers, they are already renowned as having...


  • New York, United States Hyperion Industries Full time

    Company DescriptionJoin us on an exhilarating mission at Hyperion, a VC-backed startup working with Tim Hwang, CEO of FiscalNote (NYSE: NOTE). Our co-founders, with their extensive AI and engineering backgrounds from Google, Amazon, Workday, and Instacart are leading the charge. Our mission is to revolutionize Site Reliability Engineering (SRE) with an...


  • New York, United States Hyperion Industries Full time

    Company DescriptionJoin us on an exhilarating mission at Hyperion, a VC-backed startup working with Tim Hwang, CEO of FiscalNote (NYSE: NOTE). Our co-founders, with their extensive AI and engineering backgrounds from Google, Amazon, Workday, and Instacart are leading the charge. Our mission is to revolutionize Site Reliability Engineering (SRE) with an...


  • New York, United States Mondrian Alpha Full time

    An industry leading systematic trading fund is seeking highly skilled Site Reliability Engineers to join a team responsible for engineering and supporting the companies critical infrastructure platforms. This team also handles the centralized development infrastructure and works alongside engineering teams across the business assure the optimal route of...


  • New York, United States ICTerGezocht Full time

    Locatie Amsterdam Vacature in het kort Ever thought of how many people log in to the app or Internet Banking website each month? Over five million! The objective of the Personal Banking Grid is to ensure that each visit is not only secure but also a personal and smooth experience. As a Site Reliability Engineer, you play a key role in this mission. You will...


  • New York, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • New York, United States InterEx Group Full time

    Senior Site Reliability Engineer PRIMARY ACCOUNTABILITIES Improve the reliability of mission critical solutions, applications, and platforms Software development for enterprises Continuous improvement identification and implementation Manage risks and resolve resolves issues that affect applications Lead efforts to troubleshoot and/or debug issues in any...


  • New York, New York, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • New York, United States Hebbia Full time

    About Hebbia The user interface for AGI - Hebbia is AI that works the way you work. Designed to be generally capable- it can tackle even the most complex tasks, citing answers over any amount of sources. By showing its work, Hebbia empowers users to collaborate with AI on each step and validate responses instead of blindly trusting them. Our mission is to...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in NJContract Duration: Long-term EngagementCompensation: $50 per hourNote: No OPT/CPT candidates will be considered.We are seeking a highly skilled Senior Site Reliability Engineer (SRE) with subject matter expertise. The ideal candidate will possess exceptional communication skills and the...


  • New York, New York, United States Streaming Talent Full time

    Streaming Talent is seeking a highly skilled Site Reliability Engineer to join our client's US team. As a key member of the Site Reliability Team, you will be responsible for ensuring the smooth operation of the company's Content Delivery Network.The ideal candidate will have a strong background in cloud technologies, with experience working with Kubernetes...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in New JerseyContract Duration: Long-termCompensation: $50 per hourThis role requires a highly skilled individual with a strong background in Site Reliability Engineering. The ideal candidate will possess exceptional communication abilities and the confidence to engage with executive-level teams.Key...


  • New York, United States InterEx Group Full time

    Senior Site Reliability EngineerPRIMARY ACCOUNTABILITIESImprove the reliability of mission critical solutions, applications, and platformsSoftware development for enterprisesContinuous improvement identification and implementationManage risks and resolve resolves issues that affect applicationsLead efforts to troubleshoot and/or debug issues in any...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in New JerseyContract Duration: Long-term EngagementCompensation: $50 per hourThis role requires a highly skilled individual with a strong background in Site Reliability Engineering. The ideal candidate will possess exceptional communication abilities and the confidence to engage with executive-level...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in New JerseyContract Duration: Long-term EngagementCompensation: $50 per hourThis role requires a highly skilled individual with a strong background in Site Reliability Engineering. The ideal candidate will possess:Exceptional communication skills, with the ability to engage confidently with...


  • New York, New York, United States Astir IT Solutions, Inc. Full time

    Position: Senior Site Reliability EngineerLocation: Onsite in New JerseyContract Duration: Long-term EngagementCompensation: $50 per hourThis role requires a seasoned professional with a strong background in Site Reliability Engineering. The ideal candidate will possess exceptional communication skills and the confidence to engage with executive-level...