Learning From Mislabeled Training Data Through Ambiguous Learning for In-Home Health Monitoring

Data are widely collected via the IoT for machine learning tasks in in-home health monitoring applications and mislabeled training data lead to unreliable machine learning models in in-home health monitoring. Researchers have proposed a wide arrangement of algorithms to deal with mislabeled training data, in which one straightforward and effective solution is to directly filter noise from training data so that the negative effects of mislabeled data can be minimized. In essence, noise filtering might be a suboptimal solution because the mislabeled data are not completely useless. The features and distributions of mislabeled data are still useful for learning, especially when training data are insufficient. In this work, we propose a novel framework to learn from mislabeled training data through ambiguous learning (LeMAL). LeMAL mainly consists of two parts. First, it converts the original training data to ambiguous data. Second, an ambiguous learning algorithm is applied to the ambiguous data. In this work, we propose a novel distance-based ambiguous learning algorithm so that the ambiguous data can be used in a better way. Finally, we demonstrate that LeMAL can effectively improve learning performance over existing noise filtering methods.

[1]  Zhi-Hua Zhou,et al.  Cost-Sensitive Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ata Kabán,et al.  Classification of mislabelled microarrays using robust sparse logistic regression , 2013, Bioinform..

[3]  Eduardo Gasca,et al.  Decontamination of Training Samples for Supervised Pattern Recognition Methods , 2000, SSPR/SPR.

[4]  Xindong Wu,et al.  Eliminating Class Noise in Large Datasets , 2003, ICML.

[5]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[6]  Francisco Herrera,et al.  A First Study on Decomposition Strategies with Data with Class Noise Using Decision Trees , 2012, HAIS.

[7]  Min-Ling Zhang,et al.  Disambiguation-Free Partial Label Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[8]  Enrico Blanzieri,et al.  Assessment of SVM Reliability for Microarray Data Analysis , 2004 .

[9]  Zhi-Hua Zhou,et al.  Multi-Label Active Learning: Query Type Matters , 2015, IJCAI.

[10]  Jakramate Bootkrajang,et al.  A generalised label noise model for classification in the presence of annotation errors , 2016, Neurocomputing.

[11]  Joel J. P. C. Rodrigues,et al.  Cloud Centric Authentication for Wearable Healthcare Monitoring System , 2019, IEEE Transactions on Dependable and Secure Computing.

[12]  Carla E. Brodley,et al.  Improving automated land cover mapping by identifying and eliminating mislabeled observations from training data , 1996, IGARSS '96. 1996 International Geoscience and Remote Sensing Symposium.

[13]  Bidyut Baran Chaudhuri,et al.  A new definition of neighborhood of a point in multi-dimensional space , 1996, Pattern Recognit. Lett..

[14]  Songcan Chen,et al.  Cross modal similarity learning with active queries , 2018, Pattern Recognit..

[15]  Yi Liu,et al.  SemiBoost: Boosting for Semi-Supervised Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Yingtao Bi,et al.  The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise , 2010, J. Multivar. Anal..

[17]  Nicolás García-Pedrajas,et al.  Boosting instance selection algorithms , 2014, Knowl. Based Syst..

[18]  Joel J. P. C. Rodrigues,et al.  Enabling Technologies for the Internet of Health Things , 2018, IEEE Access.

[19]  Ata Kabán,et al.  Learning a Label-Noise Robust Logistic Regression: Analysis and Experiments , 2013, IDEAL.

[20]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[21]  Dimitris N. Metaxas,et al.  Distinguishing mislabeled data from correctly labeled data in classifier design , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[22]  Taghi M. Khoshgoftaar,et al.  Knowledge discovery from imbalanced and noisy data , 2009, Data Knowl. Eng..

[23]  Joel J. P. C. Rodrigues,et al.  A novel deep learning based framework for the detection and classification of breast cancer using transfer learning , 2019, Pattern Recognit. Lett..

[24]  Joel J. P. C. Rodrigues,et al.  Enabling Technologies on Cloud of Things for Smart Healthcare , 2018, IEEE Access.

[25]  Joel J. P. C. Rodrigues,et al.  Computational Learning Approaches for Personalized Pregnancy Care , 2020, IEEE Network.

[26]  D. Altman,et al.  Measurement error. , 1996, BMJ.

[27]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Pre-processing for noise detection in gene expression classification data , 2009, Journal of the Brazilian Computer Society.

[28]  Donghai Guan,et al.  Nearest neighbor editing aided by unlabeled data , 2009, Inf. Sci..

[29]  Donghai Guan,et al.  Identifying mislabeled training data with the aid of unlabeled data , 2011, Applied Intelligence.

[30]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[31]  Ata Kabán,et al.  Learning kernel logistic regression in the presence of class label noise , 2014, Pattern Recognition.

[32]  Zhi-Hua Zhou,et al.  Editing Training Data for kNN Classifiers with Neural Network Ensemble , 2004, ISNN.

[33]  Ruggero G. Pensa,et al.  Positive and unlabeled learning in categorical data , 2016, Neurocomputing.

[34]  Taghi M. Khoshgoftaar,et al.  The pairwise attribute noise detection algorithm , 2007, Knowledge and Information Systems.

[35]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[36]  Donghai Guan,et al.  An empirical study of filter-based feature selection algorithms using noisy training data , 2014, 2014 4th IEEE International Conference on Information Science and Technology.

[37]  Nico Nagelkerke,et al.  Estimating a Logistic Discrimination Functions When One of the Training Samples Is Subject to Misclassification: A Maximum Likelihood Approach , 2015, PloS one.

[38]  Sébastien Ourselin,et al.  Wrapper Methods to Correct Mislabelled Training Data , 2013, 2013 International Workshop on Pattern Recognition in Neuroimaging.

[39]  Roberto Alejo,et al.  Analysis of new techniques to obtain quality training sets , 2003, Pattern Recognit. Lett..

[40]  Joel J. P. C. Rodrigues,et al.  A Comprehensive Review on Smart Decision Support Systems for Health Care , 2019, IEEE Systems Journal.

[41]  Xindong Wu,et al.  Positive and Unlabeled Multi-Graph Learning , 2017, IEEE Transactions on Cybernetics.