Head of AI

3 weeks ago


San Francisco, United States Pear VC Full time
About the Role:

We are looking for a founding engineer to have ownership of the public reporting and core evaluation methodology. We expect you to have top technical ability, high autonomy, and the dynamism necessary to adapt to a changing research landscape.

We aim to build a small applied AI lab within Vals AI. You will be the first hire and leader of that, eventually assembling a world-class team of MLEs. You will track the latest developments in research and implement that methodology as it improves the accuracy of our evaluations.

We are building the standard for public evaluation of LLMs in enterprise tasks and you will direct that effort.

Responsibilities

The Head of AI responsible for furthering our public benchmarks. The areas you will investigate are (1) methods for reliably/accurately evaluating generated text at scale (2) training an expert reward function and judge (3) generating synthetic evaluation datasets. This domain is wide open with many challenging problems that are crucial to solve correctly.

It's worth highlighting that in this role you are a founding team member, not an employee. This means you have ownership in the company and product direction. You don't take orders, you engage in discussions. You will have resources at your disposal to support your work. This includes the ability to contract experts for data labeling (we have existing partnerships which enable this) and to hire additional researchers or engineers.

Requirements:
  • MS/Phd in Computer Science, Artificial Intelligence, Math, Physics or related.
  • 2+ years of prior experience in NLP, building and shipping machine learning infrastructure in industry.
  • Strong experience with Python, especially in production settings.
  • Experience working in teams. This includes working in development sprints, knowledge of best-practices in working with Git, reviewing pull requests.
  • Strong communication skills. You can provide input to others and equally receive/integrate feedback.
  • A tenacity to iterate and develop quickly.
  • We are an in-person team, based in San Francisco. We will support your relocation or transportation as needed.
Nice to haves:
  • Research experience with papers published in reputable journals.
  • Experience hiring + building a team of MLEs.
  • Experience working with Django or other Python-based HTTP servers (e.g. Flask).
About Us

Measuring model ability is the most challenging part of creating applications that are capable of automating any given part of the economy. There are no good techniques or benchmarks for evaluating LLM performance on business-relevant tasks, so adoption for enterprise production settings has been limited (see Wittgenstein's ruler).

This problem materializes in each place where LLMs have potential: in understanding whether the AI tool companies are building a product will satisfy a customer demand, determining how feasible models and vendors are for a given enterprise in making purchasing decisions, for researchers who need a north star to which to expand model ability.

Today, answering these questions amounts to hiring a human review team to manually evaluate model outputs. This is prohibitively expensive and slow.

Vals AI is building the enterprise benchmark of LLM and LLM apps on real-world business tasks. In doing so we are creating the infrastructure + certification to automatically audit LLM applications, verifying they are ready for consumption.

See our benchmarks and launch announcement in Bloomberg. We aim to build the barometer for whether AI is useful, and in doing so, accelerate the automation of all knowledge work.

What we are building:

Our core technology enables us to review + automatically audit LLM applications in high value industries (legal, insurance, finance, healthcare). With this and our own data, we maintain a public benchmark of the major LLMs on enterprise tasks. Our success will be based on three components:
  1. Our evaluation performs at human-level accuracy on the relevant axes for each industry/application.
  2. Our platform has an intuitive interface that acts as a shared platform between human reviewers and engineers.
  3. We become the industry-standard benchmark, maintaining a loss-leading effort by publishing free reports and collaborating with credible data partners.

To achieve each of these, we are looking for machine learning engineers (Head of AI, Member of Technical Staff ) to develop novel evaluation techniques, strong designers and front-end engineers (Founding Engineer (Product Engineer) ) to contribute to the platform, and a tenacious operator to write reports and maintain our social media (email rayan@vals.ai if this is of interest).

What we offer:
  • Highly competitive salary and meaningful ownership. Excellence is well rewarded.
  • Relocation and transportation support.
  • Health/vision/dental insurance coverage.
  • Lunch and dinner provided, free snacks/coffee/drinks.
  • Unlimited PTO.
About us:

Founding team: The core methodology behind this platform comes from NLP evaluation research we had done at Stanford. We raised a 5M seed from some of the top institutional and angel investors in the valley. Our team has prior work experience at NVIDIA, Meta, Microsoft, Palantir and HRT. Collectively, we have over 300 citations in our published work.

Tech stack: Our frontend is built in React with TSX. We use Django as our back-end framework. All of the infra is on AWS.

What we're looking for:
  • Intelligence is more important than a good-looking resume. Industry experience and pedigree valuable only insofar as it is a proxy for talent itself.
  • Ownership to create products. We don't have the scale or time to actively "manage" every project or task. Working in a small, talent-dense team, we expect everyone to show initiative to build where it's needed, not where it's asked. We strive for autonomy over consensus.
  • Intensity. The LLM landscape is constantly changing. Foundation model labs are continuously pushing the frontier, enterprises are seeing massive pressure to adopt technology, startups are hungry to chase the white space. The unicorn companies that will emerge from this technology shift are being built now. Those that win will have an incredibly high speed of execution.
  • See solutions not problems. We're not looking for people that pass hard problems to others or admit defeat, but instead only see the opportunity to craft solutions at each juncture.
Further Reading:
  • Hugging Face blog on evaluation
  • Anthropic's blog on challenges in evaluation
  • New York Times article on issues in benchmarking
  • Stanford HAI report showing hallucinations in legal tech tools


Referral Bonus

Know someone who would be a good fit? Connect them with rayan@vals.ai. If we hire them and they stay on for 90 days you'll get a 10k referral bonus and Vals AI merch
  • Head of Growth

    2 weeks ago


    San Francisco, United States Stack AI Full time

    About the Role We're seeking a Head of Growth to drive our startup's expansion. This key role will oversee sales, marketing, content creation, and outreach initiatives. As our first non-engineering hire, you'll play a crucial part in shaping our growth strategy and execution. What You'll Do Strategic Execution : Implement plans to close deals with Fortune...


  • San Francisco, California, United States Scale AI, Inc. Full time

    About Scale AI, Inc.We are at the forefront of powering AI and LLMs across multiple industries. Our thesis is that to build exceptional LLMs you need exceptional human beings to train them. Humans are essential in providing the best training data for these models, and Scale operates the largest network of humans in the world to provide this training data.The...

  • Head of AI

    1 month ago


    San Francisco, United States Exa Full time

    We're looking for a head of AI to lead our research org. That means setting the research direction, coding up the right models/datasets/evals, and managing/growing the research team.We're an SF team of ~20 engineers/researchers from Harvard, MIT, Apple, etc. We recently raised a $17m Series A from Lightspeed and Nvidia, and we just bought a $5m H200 cluster...


  • San Francisco, United States Unreal Gigs Full time

    Introduction:Are you a pioneer in the world of artificial intelligence and machine learning, with the expertise to lead cutting-edge projects that push the boundaries of what’s possible? Do you have the strategic vision to drive AI initiatives that not only solve complex problems but also create new opportunities for innovation? If you’re a leader with a...

  • Head of Sales

    1 month ago


    San Francisco, United States Obviously AI Full time

    All businesses want to build AI, but there’s a cold-start problem - no one knows where to start. At Obviously AI, our vision is to turn every company into an AI company. To do this, we’ve created a software to build AI and LLM models in minutes, without writing code. This no-code software is the centerpiece, enabling users to harness the full power of AI...

  • Head of Training

    4 weeks ago


    San Francisco, United States Scale AI, Inc. Full time

    Scale is at the forefront of powering AI and LLMs across multiple industries. Our thesis is that to build exceptional LLMs you need exceptional human beings to train them. Humans are essential in providing the best training data for these models, and Scale operates the largest network of humans in the world to provide this training data.As the Head of...

  • Head of Sales

    1 month ago


    San Francisco, United States Obviously AI Full time

    Data Science problems are everywhere, but the talent is not. At Obviously AI, our vision is to turn every company into an AI company. We do this by providing businesses with access to world class, on-demand data science talent that helps them solve real business problems. On the back end, we empower data scientists with a set of internal groundbreaking tools...


  • San Francisco, California, United States NobleAI Full time

    NobleAI is a cutting-edge company at the forefront of Science-Based AI innovation.We are seeking an experienced Head of AI Marketing Strategy to drive the product marketing strategy for our Science-Based AI software solutions.This key role will be responsible for developing and executing marketing plans that maximize product adoption, market penetration, and...

  • Head of AI Research

    1 month ago


    San Francisco, United States Rungalileo Full time

    At Galileo, our mission is to make AI safe and available to all. Our roots lie in leading the creation of some of the world’s most widespread AI applications from Apple’s Siri to Google Speech. We believe that AI builders need well-crafted tools to create trustworthy and high-quality generative AI applications that will change how we work and live. In a...


  • San Francisco, California, United States Magic AI Full time

    About MagicMagic is a cutting-edge technology company committed to developing safe Artificial General Intelligence (AGI) that accelerates humanity's progress on the world's most pressing challenges. Our mission revolves around automating research and code generation to improve models and solve alignment more reliably than humans alone.We believe our approach...


  • San Francisco, United States 11x AI Inc. Full time

    Why 11x Now At 11x, we build autonomous digital workers that solve problems no one has tackled before, reshaping the future of AI application. Alice, our AI SDR, pioneered the category, generating millions in revenue and earning the trust of companies like Brex, Otter AI, and Hasura within a year. Our second worker, Jordan, is the world’s best calling...


  • San Francisco, CA, United States Unreal Gigs Full time

    Introduction: Are you a pioneer in the world of artificial intelligence and machine learning, with the expertise to lead cutting-edge projects that push the boundaries of what’s possible? Do you have the strategic vision to drive AI initiatives that not only solve complex problems but also create new opportunities for innovation? If you’re a leader with...


  • San Leandro, California, United States VySystems Full time

    About the RoleWe are seeking an experienced Head of Conversational AI Engineering to lead our team in designing and implementing cutting-edge AI-powered chatbot solutions.


  • San Francisco, United States 11x AI Inc. Full time

    Why 11x NowAt 11x, we build autonomous digital workers that solve problems no one has tackled before, reshaping the future of AI application. Alice, our AI SDR, pioneered the category, generating millions in revenue and earning the trust of companies like Brex, Otter AI, and Hasura within a year. Our second worker, Jordan, is the world’s best calling...


  • San Francisco, United States Ambient AI, Inc. Full time

    Ambient.ai is a unified, AI-powered physical security platform that helps enterprise organizations reduce risk, improve security operation efficiency, and gain critical insights. Seven of the top 10 U.S. technology companies, along with multiple Fortune 500 organizations across a variety of industries, leverage Ambient.ai to unify their security...

  • Head of AI Product

    2 weeks ago


    San Francisco, CA, United States Unreal Gigs Full time

    Are you passionate about creating AI products that solve real-world problems and enhance user experiences? Do you have the strategic vision and product expertise to guide AI-powered products from concept to market impact? If you’re ready to shape AI-driven solutions that redefine industry standards, our client has the perfect role for you. We’re seeking...

  • Head of Sales

    2 weeks ago


    San Francisco, CA, United States Obviously AI, Inc. Full time

    Data Science problems are everywhere, but the talent is not. At Obviously AI, our vision is to turn every company into an AI company. We do this by providing businesses with access to world class, on-demand data science talent that helps them solve real business problems. On the back end, we empower data scientists with a set of internal groundbreaking tools...

  • Head of Content

    1 week ago


    San Francisco, United States Scale AI, Inc. Full time

    Scale is at the forefront of powering AI and LLMs across multiple industries. Our thesis is that to build exceptional LLMs you need exceptional human beings to train them. Humans are essential in providing the best training data for these models, and Scale operates the largest network of humans in the world to provide this training data.As the Head of...


  • San Francisco, California, United States Descript Full time

    Job SummaryWe're seeking a seasoned leader to head our Applied AI Research team, driving cutting-edge advancements in AI and shaping the future of Descript's product and business. As the Lead AI Innovations Strategist, you'll be responsible for managing a talented team of AI researchers, developing and implementing research strategies, and collaborating with...


  • San Francisco, California, United States Unreal Gigs Full time

    Job OverviewWe are seeking an experienced AI Safety Leader to oversee the development of our AI systems, ensuring they align with ethical standards and regulatory requirements. The ideal candidate will have a strong background in AI safety, risk management, and leadership.Key ResponsibilitiesAudit and Risk Assessment: Conduct regular audits and risk...