An Efficient and Provable Approach for Mixture Proportion Estimation Using Linear Independence Assumption

In this paper, we study the mixture proportion estimation (MPE) problem in a new setting: given samples from the mixture and the component distributions, we identify the proportions of the components in the mixture distribution. To address this problem, we make use of a linear independence assumption, i.e., the component distributions are independent from each other, which is much weaker than assumptions exploited in the previous MPE methods. Based on this assumption, we propose a method (1) that uniquely identifies the mixture proportions, (2) whose output provably converges to the optimal solution, and (3) that is computationally efficient. We show the superiority of the proposed method over the state-of-the-art methods in two applications including learning with label noise and semi-supervised learning on both synthetic and real-world datasets.

[1]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[2]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[3]  Ambuj Tewari,et al.  Mixture Proportion Estimation via Kernel Embeddings of Distributions , 2016, ICML.

[4]  Clayton Scott,et al.  Class Proportion Estimation with Application to Multiclass Anomaly Rejection , 2013, AISTATS.

[5]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[6]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Arash Vahdat,et al.  Toward Robustness against Label Noise in Training Deep Discriminative Neural Networks , 2017, NIPS.

[8]  Clayton Scott,et al.  A Rate of Convergence for Mixture Proportion Estimation, with Application to Learning from Noisy Labels , 2015, AISTATS.

[9]  Richard Nock,et al.  Making Neural Networks Robust to Label Noise: a Loss Correction Approach , 2016, ArXiv.

[10]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[11]  Naftali Tishby,et al.  Multi-instance learning with any hypothesis class , 2011, J. Mach. Learn. Res..

[12]  Masashi Sugiyama,et al.  Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching , 2012, ICML.

[13]  Gang Niu,et al.  Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning , 2016, NIPS.

[14]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[16]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[17]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ivor W. Tsang,et al.  Progressive Stochastic Learning for Noisy Labels , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[22]  Jaume Amores,et al.  Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..

[23]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[24]  Ivor W. Tsang,et al.  On the Convergence of a Family of Robust Losses for Stochastic Gradient Descent , 2016, ECML/PKDD.

[25]  Gilles Blanchard,et al.  Semi-Supervised Novelty Detection , 2010, J. Mach. Learn. Res..

[26]  Bernhard Schölkopf,et al.  Characteristic Kernels on Groups and Semigroups , 2008, NIPS.

[27]  Bernhard Schölkopf,et al.  Domain Adaptation with Conditional Transferable Components , 2016, ICML.

[28]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[29]  Bernhard Schölkopf,et al.  Domain Adaptation under Target and Conditional Shift , 2013, ICML.

[30]  Dacheng Tao,et al.  Learning with Biased Complementary Labels , 2017, ECCV.

[31]  Ross B. Girshick,et al.  Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Sunita Sarawagi,et al.  Maximum Mean Discrepancy for Class Ratio Estimation: Convergence Bounds and Kernel Selection , 2014, ICML.

[33]  Masashi Sugiyama,et al.  Computationally Efficient Class-Prior Estimation under Class Balance Change Using Energy Distance , 2016, IEICE Trans. Inf. Syst..

[34]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[35]  Dacheng Tao,et al.  Multiclass Learning With Partially Corrupted Labels , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Kun Zhang,et al.  Transfer Learning with Label Noise , 2017, 1707.09724.

[37]  Kotagiri Ramamohanarao,et al.  Learning with Bounded Instance- and Label-dependent Label Noise , 2017, ICML.

[38]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[39]  Gilles Blanchard,et al.  Classification with Asymmetric Label Noise: Consistency and Maximal Denoising , 2013, COLT.

[40]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[41]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[42]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[43]  Xiaojin Zhu,et al.  Semi-Supervised Learning Tutorial , 2007 .

[44]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[45]  Gang Niu,et al.  Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data , 2016, ICML.

[46]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  M. Verleysen,et al.  Classification in the Presence of Label Noise: A Survey , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Masashi Sugiyama,et al.  Class Prior Estimation from Positive and Unlabeled Data , 2014, IEICE Trans. Inf. Syst..