Invited Talk + Q&A
in
Workshop: Pitfalls of limited data and computation for Trustworthy ML
Impacts of Data Scarcity on Groups and Harnessing LLMs for Solution (Fereshte Khani)
In this talk, I address the challenges posed by underspecification and data scarcity in machine learning, focusing on the varying impacts on different groups. I review prior methods like selective classification for addressing these challenges and discuss their limitations in modern machine learning.
To overcome these issues, I highlight the necessity of empowering individuals to create data based on their unique concepts. However, data generation has its own challenges, as it is difficult to create data for a concept without introducing shortcuts or interference with the original data or other concepts. To overcome these obstacles, I introduce CoDev, a novel framework for the collaborative development of NLP models. CoDev enables individuals to collaborate with AI and each other to generate data in a controlled manner that respects the integrity of existing concepts and original data. I conclude the talk by discussing the inherent limitations of data that persist even in the presence of infinite data.