Learning from Label Proportions via an Iterative Weighting Scheme and Discriminant Analysis

Learning from label proportions is the term used for the learning paradigm where the training data is provided in groups (or “bags”), and only the label proportion for each bag is known. The objective is to learn a model to predict the class labels of individual instances. This paradigm presents very different applications, specially concerning anonymous data. Two different iterative strategies are proposed to deal with this type of problems, both based on optimising the class membership of the instances using the estimated pattern distribution per bag and the label proportions. Discriminant analysis is reformulated to deal with non-crisp class memberships. A thorough set of experiments is conducted to test: (1) the performance gap between these approaches and the fully supervised setting, (2) the potential advantages of optimising class memberships by our proposals, and (3) the influence of factors such as the bag size and the number of classes of the problem in the performance.

[1]  Iñaki Inza,et al.  Weak supervision and other non-standard classification problems: A taxonomy , 2016, Pattern Recognit. Lett..

[2]  Iñaki Inza,et al.  Learning Bayesian network classifiers from label proportions , 2013, Pattern Recognit..

[3]  Liwei Wang,et al.  Learning a generative classifier from label proportions , 2014, Neurocomputing.

[4]  David R. Musicant,et al.  Supervised Learning by Training on Aggregate Outputs , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Dong Liu,et al.  $\propto$SVM for learning with label proportions , 2013, ICML 2013.

[6]  Bin Liu,et al.  Kernel K-means Based Framework for Aggregate Outputs Classification , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[7]  Alexander J. Smola,et al.  Estimating labels from label proportions , 2008, ICML '08.

[8]  Katharina Morik,et al.  Learning from Label Proportions by Optimizing Cluster Model Selection , 2011, ECML/PKDD.