Poster
in
Workshop: Setting up ML Evaluation Standards to Accelerate Progress
Why External Validity Matters for Machine Learning Evaluation: Motivation and Open Problems
Thomas I. Liao · Rohan Taori · Ludwig Schmidt · Inioluwa Raji
New machine learning methods often fail to perform as expected on datasets similar to benchmarks reported on in their respective papers. These performance gaps pose a challenge for evaluation: both researchers and practitioners expect (or hope) that machine learning models which perform well on a dataset designed for a task perform well on other datasets matched to that task. We argue that external validity, the relationships between tasks and the learning problems which instantiate them, is understudied. We highlight the ways in which algorithm developers and benchmark creators fail to address this concern of external validity, suggest some remedies, and identify open questions in external validity which would help the community better build benchmarks and understand model performance.