Oral (Contributed Talk)
in
Workshop: Setting up ML Evaluation Standards to Accelerate Progress
A Case for Better Evaluation Standards in NLG
Sebastian Gehrmann · Elizabeth Clark · Thibault Sellam
Abstract:
Evaluating natural language generation (NLG) models has become a popular and active field of study, which has led to the release of novel datasets, automatic metrics, and human evaluation methods. Yet, newly established best practices are often not adopted. Moreover, the research process is often hindered by the scarcity of released resources like model outputs, and a lack of documentation of evaluation parameters often complicates judging new NLG methods. We analyze 66 papers published in 2021 across 29 different dimensions to quantify this effect, and identify promising ways for the research community to improve reporting and reviewing experimental results.
Chat is not available.