Ground truthing from multi-rater labeling with three-way decision and possibility theory

Abstract In recent years, Machine Learning (ML) has attracted wide interest as aid for decision makers in complex domains, such as medicine. Although domain experts are typically aware of the intrinsic uncertainty around it, the issue of Ground Truth (GT) quality has scarcely been addressed in the ML literature. GT quality is regularly assumed to be adequate, regardless of the number and skills of raters involved in data annotation. These factors can, however, potentially have a severe negative impact on the reliability of ML models. In this article we study the influence of GT quality, in terms of number of raters, their expertise, and their agreement level, on the performance of ML models. We introduce the concept of reduction: computational procedures by which to produce single-target GT from multi-rater settings. We propose three reductions, based on three-way decision, possibility theory, and probability theory. We provide characterizations of these reductions from the perspective of learning theory and propose two ML algorithms. We report the result of experiments, on both real-world medical and synthetic datasets, showing that GT quality strongly impacts on the performance of ML models, and that the proposed algorithms can better handle this form of uncertainty compared with state-of-the-art approaches.

[1]  Fan Min,et al.  Tri-partition cost-sensitive active learning through kNN , 2017, Soft Computing.

[2]  Casper J. P. Zhang,et al.  Artificial Intelligence Versus Clinicians in Disease Diagnosis: Systematic Review , 2019, JMIR medical informatics.

[3]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[4]  Federico Cabitza,et al.  The three-way-in and three-way-out framework to treat and exploit ambiguity in data , 2020, Int. J. Approx. Reason..

[5]  H. Sebastian Seung,et al.  A solution to the single-question crowd wisdom problem , 2017, Nature.

[6]  Wei-Zhi Wu,et al.  Three-way concept learning based on cognitive operators: An information fusion viewpoint , 2017, Int. J. Approx. Reason..

[7]  D. Dubois,et al.  On Possibility/Probability Transformations , 1993 .

[8]  Lev Reyzin,et al.  Crowdsourced PAC Learning under Classification Noise , 2019, HCOMP.

[9]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[10]  Federico Cabitza,et al.  Three-Way Decision for Handling Uncertainty in Machine Learning: A Narrative Review , 2020, IJCRS.

[11]  H. Haenssle,et al.  Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists , 2018, Annals of oncology : official journal of the European Society for Medical Oncology.

[12]  R. Hertwig Tapping into the Wisdom of the Crowd—with Confidence , 2012, Science.

[13]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[14]  Federico Cabitza,et al.  New Frontiers in Explainable AI: Understanding the GI to Interpret the GO , 2019, CD-MAKE.

[15]  Yiyu Yao,et al.  An Outline of a Theory of Three-Way Decisions , 2012, RSCTC.

[16]  Heung Wong,et al.  The aggregation of multiple three-way decision spaces , 2016, Knowl. Based Syst..

[17]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[18]  Simon Parsons,et al.  Qualitative methods for reasoning under uncertainty , 2001 .

[19]  Woohyung Lim,et al.  Deep neural networks show an equivalent and often superior performance to dermatologists in onychomycosis diagnosis: Automatic construction of onychomycosis datasets by region-based convolutional deep neural network , 2018, PloS one.

[20]  Eyke Hüllermeier,et al.  Fuzzy Sets in Data Analysis: From Statistical Foundations to Machine Learning , 2019, IEEE Computational Intelligence Magazine.

[21]  Ben Taskar,et al.  Learning from Partial Labels , 2011, J. Mach. Learn. Res..

[22]  Federico Cabitza,et al.  The Elephant in the Machine: Proposing a New Metric of Data Reliability and its Application to a Medical Case to Assess Classification Reliability , 2020, Applied Sciences.

[23]  Eyke Hüllermeier,et al.  Superset Learning Based on Generalized Loss Minimization , 2015, ECML/PKDD.

[24]  Kavita Garg,et al.  Lung cancer: interobserver agreement on interpretation of pulmonary findings at low-dose CT screening. , 2008, Radiology.

[25]  Bing Huang,et al.  Cost-sensitive sequential three-way decision modeling using a deep neural network , 2017, Int. J. Approx. Reason..

[26]  Didier Dubois,et al.  Statistical reasoning with set-valued information: Ontic vs. epistemic views , 2014, Int. J. Approx. Reason..

[27]  Thierry Denœux Maximum likelihood estimation from fuzzy data using the EM algorithm , 2011 .

[28]  Carl-Magnus Svensson,et al.  Automated detection of circulating tumor cells with naive Bayesian classifiers , 2014, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[29]  Yiyu Yao,et al.  Three-way decisions with probabilistic rough sets , 2010, Inf. Sci..

[30]  Jiaqi Wang,et al.  A cost-sensitive three-way combination technique for ensemble learning in sentiment classification , 2019, Int. J. Approx. Reason..

[31]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Federico Cabitza,et al.  A giant with feet of clay: on the validity of the data that feed machine learning in medicine , 2017, Organizing for the Digital World.

[33]  Weihua Xu,et al.  Decision-theoretic rough set model of multi-source decision systems , 2018, Int. J. Mach. Learn. Cybern..

[34]  Eyke Hüllermeier,et al.  Learning from imprecise and fuzzy observations: Data disambiguation through generalized loss minimization , 2013, Int. J. Approx. Reason..

[35]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[36]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[37]  Raymond Y. K. Lau,et al.  Enhancing Binary Classification by Modeling Uncertain Boundary in Three-Way Decisions , 2017, IEEE Transactions on Knowledge and Data Engineering.

[38]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[39]  M. Figge,et al.  Droplet Microfluidics: Coding of Experimental Conditions in Microfluidic Droplet Assays Using Colored Beads and Machine Learning Supported Image Analysis (Small 4/2019) , 2019, Small.

[40]  Ivor W. Tsang,et al.  Robust Semi-Supervised Learning through Label Aggregation , 2016, AAAI.

[41]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[42]  A. Ng,et al.  Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet , 2018, PLoS medicine.

[43]  ByungSoo Ko,et al.  Naive semi-supervised deep learning using pseudo-label , 2018, Peer-to-Peer Networking and Applications.