Learning on Probabilistic Labels

Classification is a fundamental topic in the literature of data mining and all recent hot topics like active learning and transfer learning all rely on the concept of classification. Probabilistic information becomes more prevalent nowadays and can be found easily in many applications like crowdsourcing and pattern recognition. In this paper, we focus on a dataset which contains probabilistic information for classification. Based on this probabilistic dataset, we propose a classifier and give a theoretical bound linking the error rate of our classifier and the number of instances needed for training. Interestingly, we find that our theoretical bound is asymptotically at least no worse than the previously best-known bounds developed based on the traditional dataset. Furthermore, our classifier guarantees a fast rate of convergence compared with traditional classifiers. Experimental results show that our proposed algorithm has a higher accuracy than the traditional algorithm. We believe that this work is influential since it opens a new topic on the probabilistic dataset, allowing researchers to study all topics related to classification like active learning and transfer learning under this new probabilistic setting.

[1]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[2]  John Langford,et al.  Agnostic Active Learning Without Constraints , 2010, NIPS.

[3]  Stefan R ping SVM Classifier Estimation from Group Probabilities , 2010, ICML 2010.

[4]  Alexander J. Smola,et al.  Estimating Labels from Label Proportions , 2009, J. Mach. Learn. Res..

[5]  Felix X. Yu,et al.  SVM for learning with label proportions , 2013, ICML 2013.

[6]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[7]  Milos Hauskrecht,et al.  Learning Classification with Auxiliary Probabilistic Information , 2011, 2011 IEEE 11th International Conference on Data Mining.

[8]  Dong Liu,et al.  $\propto$SVM for learning with label proportions , 2013, ICML 2013.

[9]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[10]  Gerhard Paass,et al.  Exploiting Semantic Constraints for Estimating Supersenses with CRFs , 2009, SDM.

[11]  Carla E. Brodley,et al.  Who Should Label What? Instance Allocation in Multiple Expert Active Learning , 2011, SDM.

[12]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[13]  Rich Caruana,et al.  Classification with partial labels , 2008, KDD.

[14]  Maria-Florina Balcan,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[15]  Arun Sharma,et al.  Learning from Multiple Sources of Inaccurate Data , 1997, SIAM J. Comput..

[16]  Jesús Cid-Sueiro,et al.  Proper losses for learning from partial labels , 2012, NIPS.

[17]  Pascal Massart Risk bounds for Statistical Learning ( Running Title : Risk bounds for Statistical Learning ) , 2005 .

[18]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[19]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[20]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[21]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[22]  Miguel Lázaro-Gredilla,et al.  Variational Heteroscedastic Gaussian Process Regression , 2011, ICML.

[23]  Grigorios Tsoumakas,et al.  Multi-Label Classification , 2009, Database Technologies: Concepts, Methodologies, Tools, and Applications.

[24]  Philip S. Yu,et al.  An Iterative and Re-weighting Framework for Rejection and Uncertainty Resolution in Crowdsourcing , 2012, SDM.

[25]  J. A. Parker,et al.  Sensitivity, specificity and accuracy of stress SPECT myocardial perfusion imaging for detection of coronary artery disease in the distribution of first-order branch vessels, using an anatomical matching of angiographic and perfusion data , 2003, Nuclear medicine communications.

[26]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[27]  Carl E. Rasmussen,et al.  Gaussian process dynamic programming , 2009, Neurocomputing.

[28]  Ohad Shamir,et al.  Vox Populi: Collecting High-Quality Labels from a Crowd , 2009, COLT.