Workshop
Setting up ML Evaluation Standards to Accelerate Progress
Rishabh Agarwal · Stephanie Chan · Xavier Bouthillier · Caglar Gulcehre · Jesse Dodge
Fri 29 Apr, 5 a.m. PDT
The aim of the workshop is to discuss and propose standards for evaluating ML research, in order to better identify promising new directions and to accelerate real progress in the field of ML research. The problem requires understanding the kinds of practices that add or detract from the generalizability or reliability of results reported, and incentives for researchers to follow best practices. We may draw inspiration from adjacent scientific fields, from statistics, or history of science. Acknowledging that there is no consensus on best practices for ML, the workshop will have a focus on panel discussions and a few invited talks representing a variety of perspectives. The call to papers will welcome opinion papers as well as more technical papers on evaluation of ML methods. We plan to summarize the findings and topics that emerged during our workshop in a short report.
Call for Papers: https://ml-eval.github.io/call-for-papers/
Submission Site: https://cmt3.research.microsoft.com/SMILES2022
Schedule
Fri 5:00 a.m. - 5:15 a.m.
|
Welcome Note
(
Welcome Note
)
>
|
Stephanie Chan · Rishabh Agarwal · Caglar Gulcehre · Xavier Bouthillier · Jesse Dodge 🔗 |
Fri 5:15 a.m. - 5:50 a.m.
|
Invited Talk (Thomas Wolf)
(
Invited Talk
)
>
|
🔗 |
Fri 5:55 a.m. - 6:00 a.m.
|
Q&A for Thomas Wolf
(
Q&A
)
>
|
🔗 |
Fri 6:00 a.m. - 6:40 a.m.
|
Invited Talk (Frank Schneider)
(
Invited Talk
)
>
|
Frank Schneider 🔗 |
Fri 6:40 a.m. - 6:45 a.m.
|
Q&A for (Phillip Henning, Frank Schneider)
(
Q&A
)
>
|
Philipp Hennig 🔗 |
Fri 6:45 a.m. - 7:35 a.m.
|
Invited Talk (Rotem Dror)
(
Invited Talk
)
>
|
Rotem Dror 🔗 |
Fri 7:25 a.m. - 7:30 a.m.
|
Q&A for Rotem Dror
(
Q&A
)
>
|
Rotem Dror 🔗 |
Fri 7:30 a.m. - 7:40 a.m.
|
Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective
(
Oral (Contributed Talk)
)
>
SlidesLive Video |
Dennis Ulmer · Elisa Bassignana · Max Müller-Eberstein · Daniel Varab · Mike Zhang · Christian Hardmeier · Barbara Plank 🔗 |
Fri 7:40 a.m. - 7:50 a.m.
|
A Case for Better Evaluation Standards in NLG
(
Oral (Contributed Talk)
)
>
|
Sebastian Gehrmann · Elizabeth Clark · Thibault Sellam 🔗 |
Fri 7:50 a.m. - 8:50 a.m.
|
Reproducibility and Rigor in ML (Panel)
(
Panel Discussion
)
>
|
Sara Hooker · Rishabh Agarwal · Frank Schneider · Koustuv Sinha · Rotem Dror · Dr. Gael Varoquaux 🔗 |
Fri 8:50 a.m. - 9:45 a.m.
|
Poster Session 1 (Gather.Town) ( Poster Session ) > link | 🔗 |
Fri 9:35 a.m. - 10:15 a.m.
|
Invited Talk (James Evans)
(
Invited Talk
)
>
|
James Evans 🔗 |
Fri 10:15 a.m. - 10:20 a.m.
|
Q&A for James Evans
(
Q&A
)
>
|
James Evans 🔗 |
Fri 10:20 a.m. - 11:20 a.m.
|
Slow vs Fast Science (Panel)
(
Panel Discussion
)
>
|
Chelsea Finn · Oriol Vinyals · James Evans · Michela Paganini · Russ Poldrack 🔗 |
Fri 11:20 a.m. - 11:50 a.m.
|
Coffee Break
|
🔗 |
Fri 11:50 a.m. - 12:30 p.m.
|
Invited Talk (Melanie Mitchell)
(
Invited Talk
)
>
|
Melanie Mitchell 🔗 |
Fri 12:30 p.m. - 12:35 p.m.
|
Q&A for Melanie Mitchell
(
Q&A
)
>
|
🔗 |
Fri 12:35 p.m. - 1:15 p.m.
|
Invited Talk (Katherine Heller)
(
Invited Talk
)
>
|
Katherine Heller 🔗 |
Fri 1:15 p.m. - 1:20 p.m.
|
Q&A for Katherine Heller
(
Q&A
)
>
|
Katherine Heller 🔗 |
Fri 1:20 p.m. - 2:00 p.m.
|
Invited Talk (Corinna Cortes)
(
Invited Talk
)
>
|
Corinna Cortes 🔗 |
Fri 2:00 p.m. - 2:05 p.m.
|
Q&A for Corrina Cortes
(
Q&A
)
>
|
🔗 |
Fri 2:05 p.m. - 2:15 p.m.
|
A Siren Song of Open Source Reproducibility
(
Oral (Contributed Talk)
)
>
SlidesLive Video |
Edward Raff · Andrew Farris 🔗 |
Fri 2:15 p.m. - 2:25 p.m.
|
Integrating Rankings into Quantized Scores in Peer Review
(
Oral (Contributed Talk)
)
>
|
Yusha Liu · Yichong Xu · Nihar Shah · Aarti Singh 🔗 |
Fri 2:25 p.m. - 2:35 p.m.
|
Tradeoffs in Preventing Manipulation in Paper Bidding for Reviewer Assignment
(
Oral (Contributed Talk)
)
>
|
Steven Jecmen · Nihar Shah · Fei Fang · Vincent Conitzer 🔗 |
Fri 2:35 p.m. - 3:35 p.m.
|
Incentives for Better Evaluation (Panel)
(
Panel Discussion
)
>
|
Yoshua Bengio · Corinna Cortes · John Langford · Kyunghyun Cho · Xavier Bouthillier 🔗 |
Fri 3:35 p.m. - 3:35 p.m.
|
Poster Session 2 & Closing Remarks ( Gather.Town ) > link | 🔗 |
-
|
Does the Market of Citations Reward Reproducible Work?
(
Poster
)
>
SlidesLive Video |
Edward Raff 🔗 |
-
|
A Siren Song of Open Source Reproducibility
(
Poster
)
>
|
Edward Raff · Andrew Farris 🔗 |
-
|
What is Your Metric Telling You? Evaluating Classifier Calibration under Context-Specific Definitions of Reliability
(
Poster
)
>
|
John Kirchenbauer · Jacob Oaks · Eric Heim 🔗 |
-
|
System Analysis for Responsible Design of Modern AI/ML Systems
(
Poster
)
>
|
Virginia Goodwin · Rajmonda Caceres 🔗 |
-
|
A Brief Guide to Designing and Evaluating Human-Centered Interactive Machine Learning
(
Poster
)
>
|
Kory Mathewson · Patrick Pilarski 🔗 |
-
|
deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks
(
Poster
)
>
SlidesLive Video |
Dennis Ulmer · Christian Hardmeier · Jes Frellsen 🔗 |
-
|
Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective
(
Poster
)
>
|
Dennis Ulmer · Elisa Bassignana · Max Müller-Eberstein · Daniel Varab · Mike Zhang · Christian Hardmeier · Barbara Plank 🔗 |
-
|
Reproducible Subjective Evaluation
(
Poster
)
>
|
Max Morrison · Brian Tang · Gefei Tan · Bryan Pardo 🔗 |
-
|
Increasing Confidence in Adversarial Robustness Evaluations
(
Poster
)
>
SlidesLive Video |
Roland S. Zimmermann · Wieland Brendel · Florian Tramer · Nicholas Carlini 🔗 |
-
|
Machine Learning State-of-the-Art with Uncertainties
(
Poster
)
>
|
Peter Steinbach · Steve Schmerler · Sebastian Starke · Mahnoor Tanveer · Felicita Purnama Dewi Gernhardt 🔗 |
-
|
A Case for Better Evaluation Standards in NLG
(
Poster
)
>
|
Sebastian Gehrmann · Elizabeth Clark · Thibault Sellam 🔗 |
-
|
Setting Clear Expectations for Uncertainty Estimation
(
Poster
)
>
SlidesLive Video |
Victor Bouvier · Simona Maggio · Alexandre Abraham · Dreyfus-Schmidt Schmidt 🔗 |
-
|
Rethinking Machine Learning Model Evaluation in Pathology
(
Poster
)
>
SlidesLive Video |
Aaditya Prakash · Dinkar Juyal · Syed Javed · Zahil Shanis · Shreya chakraborty · Harsha pokkalla 🔗 |
-
|
Strengthening Subcommunities: Towards Sustainable Growth in AI Research
(
Poster
)
>
|
Andi Peng · Jessica Forde Jessica Forde · Yonadav Shavit · Jonathan Frankle 🔗 |
-
|
A meta analysis of data-driven newsvendor approaches
(
Poster
)
>
|
Simone Buttler · Andreas Philippi · Nikolai Stein · Richard Pibernik 🔗 |
-
|
CheckDST: Measuring Real-World Generalization of Dialogue State Tracking Performance
(
Poster
)
>
|
Hyundong Cho · Chinnadhurai Sankar · Christopher Lin · Kaushik Ram Sadagopan · Shahin Shayandeh · Asli Celikyilmaz · Jonathan May · Ahmad Beirami 🔗 |
-
|
A Survey On Uncertainty Toolkits For Deep Learning
(
Poster
)
>
|
Maximilian Pintz · Joachim Sicking · Maximilian Poretschkin · Maram Akila 🔗 |
-
|
Integrating Rankings into Quantized Scores in Peer Review
(
Poster
)
>
|
Yusha Liu · Yichong Xu · Nihar Shah · Aarti Singh 🔗 |
-
|
A Revealing Large-Scale Evaluation of Unsupervised Anomaly Detection Algorithms
(
Poster
)
>
SlidesLive Video |
Maxime Alvarez · Jean-Charles Verdier · DJeff Kanda Nkashama · Froduald Kabanza · Marc Frappier · Pierre Martin Tardif 🔗 |
-
|
Rethinking Streaming Machine Learning Evaluation
(
Poster
)
>
|
Shreya Shankar · Bernease Herman · Aditya Parameswaran 🔗 |
-
|
Tradeoffs in Preventing Manipulation in Paper Bidding for Reviewer Assignment
(
Poster
)
>
|
Steven Jecmen · Nihar Shah · Fei Fang · Vincent Conitzer 🔗 |
-
|
Towards Yet Another Checklist for New Datasets
(
Poster
)
>
|
Stefan Larson 🔗 |
-
|
Are Ground Truth Labels Reproducible? An Empirical Study
(
Poster
)
>
SlidesLive Video |
Ka Wong · Praveen Paritosh · Kurt Bollacker 🔗 |
-
|
Incentivizing Empirical Science in Machine Learning: Problems and Proposals
(
Poster
)
>
|
Preetum Nakkiran · Misha Belkin 🔗 |
-
|
Why External Validity Matters for Machine Learning Evaluation: Motivation and Open Problems
(
Poster
)
>
|
Thomas I. Liao · Rohan Taori · Ludwig Schmidt · Inioluwa Raji 🔗 |
-
|
A Quality-Diversity-based Evaluation Strategy for Symbolic Music Generation
(
Poster
)
>
SlidesLive Video |
Berker Banar · Simon Colton 🔗 |