Invited Talk
Understanding Systematic Deviations in Data for Trustworthy AI
Girmaw Abebe Tadesse
Auditorium
With a growing trend of employing machine learning (ML) models to assist decision making, it is vital to inspect both the models and their corresponding data for potential systematic deviations in order to achieve trustworthy ML applications. Such inspected data may be used in training, testing or generated by the models themselves. Understanding of systematic deviations is particularly crucial in resource-limited and/or error-sensitive domains, such as healthcare. In this talk, I reflect on our recent work which has utilized automated identification and characterization of systematic deviations for various tasks in healthcare, including; data quality understanding; temporal drift; heterogeneous intervention effects analysis; and new class detection. Moreover, AI-driven scientific discovery is increasingly being facilitated using generative models. And I will share how our data-centric and multi-level evaluation framework helps to quantify the capabilities of generative models in both domain-agnostic and interpretable ways, using material science as a use case. Beyond the analysis of curated datasets which are often utilized to train ML models, similar data-centric analysis should also be considered on traditional data sources, such as textbooks. To this end I will conclude by presenting a recent collaborative work on automated representation analysis in dermatology academic materials.