Supervised versus multiple instance learning: an empirical comparison

We empirically study the relationship between supervised and multiple instance (MI) learning. Algorithms to learn various concepts have been adapted to the MI representation. However, it is also known that concepts that are PAC-learnable with one-sided noise can be learned from MI data. A relevant question then is: how well do supervised learners do on MI data? We attempt to answer this question by looking at a cross section of MI data sets from various domains coupled with a number of learning algorithms including Diverse Density, Logistic Regression, nonlinear Support Vector Machines and FOIL. We consider a supervised and MI version of each learner. Several interesting conclusions emerge from our work: (1) no MI algorithm is superior across all tested domains, (2) some MI algorithms are consistently superior to their supervised counterparts, (3) using high false-positive costs can improve a supervised learner's performance in MI domains, and (4) in several domains, a supervised algorithm is superior to any MI algorithm we tested.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  Adam Tauman Kalai,et al.  A Note on Learning from Multiple-Instance Examples , 2004, Machine Learning.

[3]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[4]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[6]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[7]  Peter Auer,et al.  A Boosting Approach to Multiple Instance Learning , 2004, ECML.

[8]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[9]  Qi Zhang,et al.  Content-Based Image Retrieval Using Multiple-Instance Learning , 2002, ICML.

[10]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[11]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[12]  Yuh-Jye Lee,et al.  SSVM: A Smooth Support Vector Machine for Classification , 2001, Comput. Optim. Appl..

[13]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[14]  N. V. Vinodchandran,et al.  SVM-based generalized multiple-instance learning via approximate box counting , 2004, ICML.

[15]  Giancarlo Ruffo,et al.  Learning single and multiple instance decision tree for computer security applications , 2000 .

[16]  R. Fletcher Practical Methods of Optimization , 1988 .

[17]  Xin Xu,et al.  Logistic Regression and Boosting for Labeled Bags of Instances , 2004, PAKDD.

[18]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[19]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[20]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[21]  Alfonso Valencia,et al.  Evaluation of BioCreAtIvE assessment of task 2 , 2005, BMC Bioinformatics.

[22]  T. M. Williams,et al.  Practical Methods of Optimization. Vol. 1: Unconstrained Optimization , 1980 .

[23]  Jun Wang,et al.  Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[24]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.