Skip to content

Rutgers.edu   |   Rutgers Search

Humans First Fund
  • About
    • Students
    • People
    • Our Values
    • Programs
  • Human Rights Index
    • Purpose
    • Human Rights
    • Principles
    • Instruments
    • Sectors
    • Glossary
    • CHARTER
    • Editors’ Desk
  • Project Insight
  • Publications
    • AI & Human Rights Index
    • Moral Imagination
    • Human Rights in Global AI Ecosystems
  • Courses
    • AI & Society
    • AI Ethics & Law
    • AI & Vulnerable Humans
  • News
  • Opportunities
  • About
    • Students
    • People
    • Our Values
    • Programs
  • Human Rights Index
    • Purpose
    • Human Rights
    • Principles
    • Instruments
    • Sectors
    • Glossary
    • CHARTER
    • Editors’ Desk
  • Project Insight
  • Publications
    • AI & Human Rights Index
    • Moral Imagination
    • Human Rights in Global AI Ecosystems
  • Courses
    • AI & Society
    • AI Ethics & Law
    • AI & Vulnerable Humans
  • News
  • Opportunities
  • All
  • A
  • B
  • C
  • D
  • E
  • F
  • G
  • H
  • I
  • J
  • K
  • L
  • M
  • N
  • O
  • P
  • Q
  • R
  • S
  • T
  • U
  • V
  • W
  • X
  • Y
  • Z

Evaluation

Evaluation (commonly referred to as an “eval”) measures an AI model's performance on a specific set of benchmark tasks. Researchers use evals to assess their models' strengths and weaknesses by comparing the model's answers to the correct answers for each task, computing the accuracy, and identifying areas for improvement. However, researchers should heed Goodhart’s Law when optimizing performance on evals: using a measure as a target can render it ineffective. A model may excel at tasks in the dataset but fail to generalize across the domain, or hidden capabilities may go unnoticed if certain examples are excluded. To assess strengths and weaknesses, individuals and organizations should curate diverse evaluation datasets. As the old adage goes, “what gets measured gets managed.”


Disclaimer: Our global network of contributors to the AI & Human Rights Index is currently writing these articles and glossary entries. This particular page is currently in the recruitment and research stage. Please return later to see where this page is in the editorial workflow. Thank you! We look forward to learning with and from you.

  • Rutgers.edu
  • New Brunswick
  • Newark
  • Camden
  • Rutgers Health
  • Online
  • Rutgers Search
About
  • Mission
  • Values
  • People
  • Courses
  • Programs
  • News
  • Opportunities
  • Style Guide
Human Rights Index
  • Purpose
  • Human Rights
  • Principles
  • Sectors
  • Glossary
Project Insight
Moral Imagination
Humans First Fund

Dr. Nathan C. Walker
Principal Investigator, AI Ethics Lab

Rutgers University-Camden
College of Arts & Sciences
Department of Philosophy & Religion

AI Ethics Lab at the Digital Studies Center
Cooper Library in Johnson Park
101 Cooper St, Camden, NJ 08102

Copyright ©2025, Rutgers, The State University of New Jersey

Rutgers is an equal access/equal opportunity institution. Individuals with disabilities are encouraged to direct suggestions, comments, or complaints concerning any accessibility issues with Rutgers websites to accessibility@rutgers.edu or complete the Report Accessibility Barrier / Provide Feedback Form.