Senior AI Evaluation Scientist with Security Clearance

1 week ago


Fairfax, Virginia, United States steampunk Full time


Overview

We are seeking an experienced Senior AI Evaluation Scientist to design and lead rigorous evaluation programs for predictive and generative AI systems across our enterprise and client engagements. This role is critical to ensuring that AI solutions are accurate, reliable, safe, and aligned with mission outcomes. The Senior AI Evaluation Scientist will develop evaluation frameworks, build automated testing pipelines, and act as a subject-matter expert on AI quality, risk, and performance measurement. This role blends deep technical expertise with analytical rigor, experimentation, and cross-functional collaboration.



Contributions

  • Lead the design and implementation of comprehensive evaluation frameworks for generative and predictive AI models, including accuracy, robustness, relevance, trustworthiness, fairness, hallucination rates, and safety.
  • Develop and maintain automated evaluation pipelines that continuously audit model outputs, monitor quality drift, and validate alignment with mission-specific constraints.
  • Create custom benchmark datasets, challenge sets, and adversarial evaluation strategies tailored to client domains and regulatory requirements.
  • Conduct in-depth error analysis, model behavior studies, and sensitivity assessments to inform iterative improvements in prompts, retrieval systems, models, and orchestration frameworks.
  • Partner with AI Product Engineers, LLMOps Engineers, and Data Scientists to drive model improvements through structured experimentation, A/B testing, and scientifically grounded evaluation cycles.
  • Advise teams on measurement methodologies, statistical significance, and best practices for Trustworthy AI evaluation in alignment with NIST AI RMF, MLSecOps, and agency governance requirements.
  • Document evaluation results, risks, and findings for technical and non-technical audiences, including engineering teams, leadership, and government clients.
  • Contribute to the development of standardized tools, reusable templates, and evaluation components to improve repeatability and quality across engagements.
  • Stay informed of advances in LLM assessment, safety science, red-teaming methodologies, and evaluation frameworks emerging from academia and industry.
  • Mentor junior evaluation staff and help grow Steampunk's AI measurement and evaluation capabilities.
  • You will contribute to the growth of our AI & Data Exploitation Practice


Qualifications

  • Ability to hold a position of public trust with the U.S. government
  • Master's Degree (related program) and 7 years of relevant experience; OR
    • Bachelor's Degree (related program) and 10 years of relevant experience; OR
    • No degree and 16 years of relevant experience
  • Possesses at least one professional certification relevant to the technical service provided. Maintain a certification relevant to the product being deployed and/or maintained.
  • 8+ years of experience evaluating machine learning, NLP, or generative AI systems, with strong familiarity with LLMs and retrieval-based architectures.
  • Deep understanding of evaluation metrics, statistical testing, dataset construction, experimental design, and model validation methodologies.
  • Hands-on experience with Python and libraries such as PyTorch, Hugging Face, LangChain, scikit-learn, and evaluation tooling (LLM-as-a-judge, rubric-based evaluators, or custom harnesses).
  • Demonstrated experience designing automated evaluation pipelines and integrating them into CI/CD or LLMOps workflows.
  • Strong understanding of AI governance, responsible AI principles, bias detection, fairness metrics, and risk identification.
  • Experience working with structured and unstructured datasets across multiple modalities (text, tabular, documents).
  • Familiarity with vector databases, RAG architectures, and multi-step LLM workflows.
  • Excellent analytical, written, and verbal communication skills, with the ability to translate evaluation insights into clear technical recommendations.
  • Proven ability to collaborate with cross-functional engineering and product teams while independently driving evaluation strategy.
  • Experience working in agile or iterative development environments and documenting scientific processes clearly.


About steampunk

Steampunk relies on several factors to determine salary, including but not limited to geographic location, contractual requirements, education, knowledge, skills, competencies, and experience. The projected compensation range for this position is $135,000 to $170,000.  The estimate displayed represents a typical annual salary range for this position. Annual salary is just one aspect of Steampunk's total compensation package for employees. Learn more about additional Steampunk benefits here. 

Identity Statement

As part of the application process, you are expected to be on camera during interviews and assessments. We reserve the right to take your picture to verify your identity and prevent fraud.

Steampunk is a Change Agent in the Federal contracting industry, bringing new thinking to clients in the Homeland, Federal Civilian, Health and DoD sectors.  Through our Human-Centered delivery methodology, we are fundamentally changing the expectations our Federal clients have for true shared accountability in solving their toughest mission challenges.  As an employee owned company, we focus on investing in our employees to enable them to do the greatest work of their careers – and rewarding them for outstanding contributions to our growth. If you want to learn more about our story, visit



  • AI Engineer

    3 weeks ago


    Fairfax, Virginia, United States Plateau Software Full time

    Plateau GRP is seeking a skilled AI Engineer to support a large-scale data modernization initiative for the Federal Deposit Insurance Corporation (FDIC). The ideal candidate will apply artificial intelligence and data engineering expertise to enhance data accessibility, automate processes, and support advanced analytics capabilities within a modernized cloud...


  • Fairfax, Virginia, United States Sparton Corporation Full time

    Facility Security Officer (FSO) with Information System Security Manager (ISSM) Experience (Onsite/Fairfax VA location) Position ObjectiveThe FSO is responsible for implementing and maintaining a security program that complies with the NISPOM (32 CFR Part 117) and other regulations, and partnering with cognizant security authorities, senior management, and...


  • Fairfax, Virginia, United States Sentry Force Security Full time

    We are currently looking for a full time Human Resources Coordinator to manage the hiring and onboarding of new employees for our company. The responsibilities of this position include, but are not limited to:Interview applicants and make hiring decisions based on each candidate's set of skills and qualificationsHire and onboard new employees by adding...


  • Fairfax, Virginia, United States Plateau Software Full time

    **Location**: Remote (with secure network access)**Clearance Requirement**: Tier 5 Favorable Adjudication**Certification**: CISSP, CISM, CISA, GSLC, or CCISO (Required)About Plateau GroupAt Plateau Group, we bring robust experience delivering secure, compliant, and scalable IT solutions to the federal government. Our capabilities span cloud architecture,...


  • Fairfax, Virginia, United States ECS Full time

    ECS is seeking a Growth Leader – Defense, Intelligence & Health Business Unit to work in our Fairfax, VA office.We are a rapidly growing company that considers our employees and teams to be our most important assets. Our team environment provides opportunities for growth to individuals motivated to excel. We are situated in Merrifield, VA - well positioned...

  • Cloud Engineer

    2 days ago


    Fairfax, Virginia, United States SRC Full time

    Founded in 2010 by a technical leader, our client fosters a unique blend of collaborative spirit and professional excellence in IT consulting. Their laid-back, supportive environment encourages continuous learning with certification bonuses and flexible schedules. They demonstrate their commitment to employee well-being with industry-leading salaries,...


  • Fairfax, Virginia, United States ECS Federal, LLC Full time

    ECS is seeking a Security Operations Manager to work remotely. Please Note: This position is contingent upon contract award.ECS is seeking an experienced Security Operations Manager to work remotely providing Cyber Security operations support for NIH NIAID Enabling and Advancing Technologies (NEAT). This engagement provides a spectrum of management,...


  • Fairfax, Virginia, United States ICR, Inc. Full time

    Mid-Senior Systems EngineerFairfax, VAApplications are accepted on an ongoing basis for this position.Position Description:ICR is a rapidly growing employee-owned company focused on solving the hardest problems for the US Government. Join our Sensor to Shooter application development team to create the next generation of technologies and systems to protect...

  • Systems Architect

    3 weeks ago


    Fairfax, Virginia, United States Bixal Full time

    Important Notice for Applicants:At Bixal, we want to ensure a transparent and secure application process for all candidates. Official communication will come from an email address ending in or Messages from other sources may be fraudulent, and you should exercise care to avoid any links or attachments included. Bixal will ensure that individuals with...

  • Management Analyst

    2 weeks ago


    Fairfax, Virginia, United States All Native Group Full time

    Summary:Management AnalystAll Native Shared Services, an All Native Group Company, a division of Ho-Chunk, Inc.Location: Fairfax, VAJob Type: Full-time | Monday to Friday, 8:00 a.m. – 5:00 p.m.Compensation & BenefitsSalary Range: $80,000-$110,000Comprehensive benefits packageAbout Ho-Chunk, Inc. & All Native Shared ServicesHo-Chunk, Inc. is an...