Learning classification models with soft-label information

OBJECTIVE Learning of classification models in medicine often relies on data labeled by a human expert. Since labeling of clinical data may be time-consuming, finding ways of alleviating the labeling costs is critical for our ability to automatically learn such models. In this paper we propose a new machine learning approach that is able to learn improved binary classification models more efficiently by refining the binary class information in the training phase with soft labels that reflect how strongly the human expert feels about the original class labels. MATERIALS AND METHODS Two types of methods that can learn improved binary classification models from soft labels are proposed. The first relies on probabilistic/numeric labels, the other on ordinal categorical labels. We study and demonstrate the benefits of these methods for learning an alerting model for heparin induced thrombocytopenia. The experiments are conducted on the data of 377 patient instances labeled by three different human experts. The methods are compared using the area under the receiver operating characteristic curve (AUC) score. RESULTS Our AUC results show that the new approach is capable of learning classification models more efficiently compared to traditional learning methods. The improvement in AUC is most remarkable when the number of examples we learn from is small. CONCLUSIONS A new classification learning framework that lets us learn from auxiliary soft-label information provided by a human expert is a promising new direction for learning classification models from expert labels, reducing the time and cost needed to label data.

[1]  Wei Chu,et al.  New approaches to support vector ordinal regression , 2005, ICML.

[2]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[5]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[6]  Milos Hauskrecht,et al.  Multivariate Time Series Classification with Temporal Abstractions , 2009, FLAIRS.

[7]  P. McCullagh Regression Models for Ordinal Data , 1980 .

[8]  Jeremy E. Oakley,et al.  Uncertain Judgements: Eliciting Experts' Probabilities , 2006 .

[9]  W M Tierney,et al.  Physicians' Estimates of the Probability of Myocardial Infarction in Emergency Boom Patients with chest Pain , 1986, Medical decision making : an international journal of the Society for Medical Decision Making.

[10]  Eyke Hüllermeier,et al.  Preference Learning , 2005, Künstliche Intell..

[11]  Eyke Hllermeier,et al.  Preference Learning , 2010 .

[12]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[13]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[14]  Milos Hauskrecht,et al.  Feature importance analysis for patient management decisions , 2010, MedInfo.

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  Arthur E. Hoerl,et al.  Ridge Regression — 1980: Advances, Algorithms, and Applications , 1981 .

[17]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[18]  P J Simpson,et al.  Impact of the patient population on the risk for heparin-induced thrombocytopenia. , 2000, Blood.

[19]  Yuval Shahar,et al.  Temporal Information Systems in Medicine , 2010 .

[20]  J G Dolan,et al.  An Eualuation of Clinicians' Subjective Prior Probability Estimates , 1986, Medical decision making : an international journal of the Society for Medical Decision Making.

[21]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[22]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[23]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[24]  R D Cebul,et al.  The accuracy of experienced physicians' probability estimates for patients with sore throats. Implications for decision making. , 1985, JAMA.

[25]  Milos Hauskrecht,et al.  Modeling treatment of ischemic heart disease with partially observable Markov decision processes , 1998, AMIA.

[26]  Gilles Clermont,et al.  Outlier detection for patient monitoring and alerting , 2013, J. Biomed. Informatics.

[27]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[28]  Milos Hauskrecht,et al.  A Pattern Mining Approach for Classifying Multivariate Temporal Data , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[29]  Milos Hauskrecht,et al.  Mining recent temporal patterns for event detection in multivariate time series data , 2012, KDD.

[30]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[31]  Gregory F Cooper,et al.  Conditional outlier detection for clinical alerting. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[32]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.