On Learning with Label Proportions

Learning from Label Proportions (LLP) is a learning setting, where the training data is provided in groups, or "bags", and only the proportion of each class in each bag is known. The task is to learn a model to predict the class labels of the individual instances. LLP has broad applications in political science, marketing, healthcare, and computer vision. This work answers the fundamental question, when and why LLP is possible, by introducing a general framework, Empirical Proportion Risk Minimization (EPRM). EPRM learns an instance label classifier to match the given label proportions on the training data. Our result is based on a two-step analysis. First, we provide a VC bound on the generalization error of the bag proportions. We show that the bag sample complexity is only mildly sensitive to the bag size. Second, we show that under some mild assumptions, good bag proportion prediction guarantees good instance label prediction. The results together provide a formal guarantee that the individual labels can indeed be learned in the LLP setting. We discuss applications of the analysis, including justification of LLP algorithms, learning with population proportions, and a paradigm for learning algorithms with privacy guarantees. We also demonstrate the feasibility of LLP based on a case study in real-world setting: predicting income based on census data.

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[3]  Philip M. Long,et al.  A Generalization of Sauer's Lemma , 1995, J. Comb. Theory A.

[4]  L Sweeney,et al.  Weaving Technology and Policy Together to Maintain Confidentiality , 1997, Journal of Law, Medicine & Ethics.

[5]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[6]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[7]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[8]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[9]  Philip S. Yu,et al.  Is random model better? On its accuracy and efficiency , 2003, Third IEEE International Conference on Data Mining.

[10]  Adam Tauman Kalai,et al.  A Note on Learning from Multiple-Instance Examples , 2004, Machine Learning.

[11]  Nando de Freitas,et al.  Learning about Individuals from Group Statistics , 2005, UAI.

[12]  David R. Musicant,et al.  Learning from Aggregate Views , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[14]  Cynthia Dwork,et al.  Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[15]  David R. Musicant,et al.  Supervised Learning by Training on Aggregate Outputs , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[16]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[17]  Alexander J. Smola,et al.  Estimating labels from label proportions , 2008, ICML '08.

[18]  Rebecca N. Wright,et al.  A Practical Differentially Private Random Decision Tree Classifier , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[19]  Stefan Rüping,et al.  SVM Classifier Estimation from Group Probabilities , 2010, ICML.

[20]  Boris Babenko,et al.  Multiple Instance Learning with Manifold Bags , 2011, ICML.

[21]  Sivan Sabato,et al.  Partial information and distribution-dependence in supervised learning models (שער נוסף בעברית: מידע חלקי ותלות בהתפלגות במודלים של למידה מונחית.) , 2012 .

[22]  Naftali Tishby,et al.  Multi-instance learning with any hypothesis class , 2011, J. Mach. Learn. Res..

[23]  Dong Liu,et al.  $\propto$SVM for learning with label proportions , 2013, ICML 2013.

[24]  Krzysztof Choromanski,et al.  Differentially-Private Learning of Low Dimensional Manifolds , 2013, ALT.

[25]  Krzysztof Choromanski,et al.  Adaptive Anonymity via b-Matching , 2013, NIPS.

[26]  Ming-Syan Chen,et al.  Video Event Detection by Inferring Temporal Instance Labels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Tao Chen,et al.  Object-Based Visual Sentiment Concept Analysis and Application , 2014, ACM Multimedia.

[28]  Tao Chen,et al.  Modeling Attributes from Category-Attribute Proportions , 2014, ACM Multimedia.

[29]  Jerrold R. Griggs,et al.  Journal of Combinatorial Theory, Series A , 2011 .

[30]  Krzysztof Choromanski,et al.  Differentially-private learning of low dimensional manifolds , 2016, Theor. Comput. Sci..