Single- vs. multiple-instance classification

In multiple-instance (MI) classification, each input object or event is represented by a set of instances, named a bag, and it is the bag that carries a label. MI learning is used in different applications where data is formed in terms of such bags and where individual instances in a bag do not have a label. We review MI classification from the point of view of label information carried in the instances in a bag, that is, their sufficiency for classification. Our aim is to contrast MI with the standard approach of single-instance (SI) classification to determine when casting a problem in the MI framework is preferable. We compare instance-level classification, combination by noisy-or, and bag-level classification, using the support vector machine as the base classifier. We define a set of synthetic MI tasks at different complexities to benchmark different MI approaches. Our experiments on these and two real-world bioinformatics applications on gene expression and text categorization indicate that depending on the situation, a different decision mechanism, at the instance- or bag-level, may be appropriate. If the instances in a bag provide complementary information, a bag-level MI approach is useful; but sometimes the bag information carries no useful information at all and an instance-level SI classifier works equally well, or better. HighlightsWe categorize problems by the amount of label information instances in a bag carry.We define synthetic tasks of increasing complexity or intra-bag dependency.These problems allow us to measure the power of multiple-instance algorithms.We experiment on two bioinformatics data for gene expression and text categorization.

[1]  Robert P. W. Duin,et al.  Multiple-instance learning as a classifier combining problem , 2013, Pattern Recognit..

[2]  Murat Dundar,et al.  Bayesian multiple instance learning: automatic feature selection and inductive transfer , 2008, ICML '08.

[3]  Xin Xu,et al.  Statistical Learning in Multiple Instance Problems , 2003 .

[4]  Wan-Jui Lee,et al.  Bag Dissimilarities for Multiple Instance Learning , 2011, SIMBAD.

[5]  Peter V. Gehler,et al.  Deterministic Annealing for Multiple-Instance Learning , 2007, AISTATS.

[6]  Zhi-Hua Zhou,et al.  Multi-instance learning by treating instances as non-I.I.D. samples , 2008, ICML '09.

[7]  Xin Xu,et al.  Logistic Regression and Boosting for Labeled Bags of Instances , 2004, PAKDD.

[8]  Marco Loog,et al.  Does one rotten apple spoil the whole barrel? , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[9]  AmoresJaume Multiple instance classification , 2013 .

[10]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[11]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[12]  Mark Craven,et al.  Learning Statistical Models for Annotating Proteins with Function Information using Biomedical Text , 2005, BMC Bioinformatics.

[13]  D. Palachanis Using the Multiple Instance Learning framework to address differential regulation , 2014 .

[14]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[15]  Jaume Amores,et al.  Multiple instance classification: Review, taxonomy and comparative study , 2013, Artif. Intell..

[16]  Marco Loog,et al.  Multiple instance learning with bag dissimilarities , 2013, Pattern Recognit..

[17]  Adam Tauman Kalai,et al.  A Note on Learning from Multiple-Instance Examples , 2004, Machine Learning.

[18]  Mark Craven,et al.  Supervised versus multiple instance learning: an empirical comparison , 2005, ICML.

[19]  Zhi-Hua Zhou,et al.  On the relation between multi-instance learning and semi-supervised learning , 2007, ICML '07.

[20]  James R. Foulds,et al.  A review of multi-instance learning assumptions , 2010, The Knowledge Engineering Review.

[21]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Bernhard Pfahringer,et al.  A Two-Level Learning Method for Generalized Multi-instance Problems , 2003, ECML.