Learning from label proportions with pinball loss

Learning from label proportions is a new kind of learning problem which has drawn much attention in recent years. Different from the well-known supervised learning, it considers instances in bags and uses the label proportion of each bag instead of instance. As obtaining the instance label is not always feasible, it has been widely used in areas like modeling voting behaviors and spam filtering. However, learning from label proportions still suffers great challenges due to the inference of noise, the improper partition of bags and so on. In this paper, we propose a novel learning from label proportions method based on pinball loss, called “pSVM-pin”, to address the above issues. The pinball loss is introduced to generate an effective classifier in order to eliminate the impact of noise. Experimental results prove the precision of pSVM-pin compared with competing methods.

[1]  Nando de Freitas,et al.  Learning about Individuals from Group Statistics , 2005, UAI.

[2]  Paulo Cortez,et al.  Using data mining for bank direct marketing: an application of the CRISP-DM methodology , 2011 .

[3]  Alexander J. Smola,et al.  Estimating labels from label proportions , 2008, ICML '08.

[4]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[5]  Katharina Morik,et al.  Learning from Label Proportions by Optimizing Cluster Model Selection , 2011, ECML/PKDD.

[6]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[7]  R. Koenker Quantile Regression: Name Index , 2005 .

[8]  Dong Liu,et al.  $\propto$SVM for learning with label proportions , 2013, ICML 2013.

[9]  C. A. Murthy,et al.  Data condensation in large databases by incremental learning with support vector machines , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[10]  Xiang Zhang,et al.  CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition , 2008, SIGMOD Conference.

[11]  Andreas Christmann,et al.  How SVMs can estimate quantiles and the median , 2007, NIPS.

[12]  D. Aruna Kumari Slicing: A New Approach To Privacy Preserving Data Publishing Related To Medical Data-Base Using K-Means Clustering Technique , 2013 .

[13]  Yufei Tao,et al.  Anatomy: simple and effective privacy preservation , 2006, VLDB.

[14]  Tao Chen,et al.  Object-Based Visual Sentiment Concept Analysis and Application , 2014, ACM Multimedia.

[15]  Ming-Syan Chen,et al.  Video Event Detection by Inferring Temporal Instance Labels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Iñaki Inza,et al.  A Novel Weakly Supervised Problem: Learning from Positive-Unlabeled Proportions , 2015, CAEPIA.

[17]  Razvan C. Bunescu,et al.  Multiple instance learning for sparse positive bags , 2007, ICML '07.

[18]  Hendrik Blockeel,et al.  Instance-level accuracy versus bag-level accuracy in multi-instance learning , 2011, Data Mining and Knowledge Discovery.

[19]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[20]  Richard Nock,et al.  (Almost) No Label No Cry , 2014, NIPS.

[21]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[22]  Sumit Sarkar,et al.  A Tree-Based Data Perturbation Approach for Privacy-Preserving Data Mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[23]  Ingo Steinwart,et al.  Estimating conditional quantiles with the help of the pinball loss , 2011, 1102.2101.

[24]  Stefan Rüping,et al.  SVM Classifier Estimation from Group Probabilities , 2010, ICML.

[25]  Iñaki Inza,et al.  Learning Naive Bayes Models for Multiple-Instance Learning with Label Proportions , 2011, CAEPIA.

[26]  Johan A. K. Suykens,et al.  Sequential minimal optimization for SVM with pinball loss , 2015, Neurocomputing.

[27]  Johan A. K. Suykens,et al.  Support Vector Machine Classifier With Pinball Loss , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  David R. Musicant,et al.  Supervised Learning by Training on Aggregate Outputs , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[29]  Rathindra Sarathy,et al.  A General Additive Data Perturbation Method for Database Security , 1999 .

[30]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge for Privacy-Preserving Data Publishing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.