论文信息 - A Comparison of Multi-instance Learning Algorithms

A Comparison of Multi-instance Learning Algorithms

Motivated by various challenging real-world applications, such as drug activity prediction and image retrieval, multi-instance (MI) learning has attracted considerable interest in recent years. Compared with standard supervised learning, the MI learning task is more difficult as the label information of each training example is incomplete. Many MI algorithms have been proposed. Some of them are specifically designed for MI problems whereas others have been upgraded or adapted from standard single-instance learning algorithms. Most algorithms have been evaluated on only one or two benchmark datasets, and there is a lack of systematic comparisons of MI learning algorithms. This thesis presents a comprehensive study of MI learning algorithms that aims to compare their performance and find a suitable way to properly address different MI problems. First, it briefly reviews the history of research on MI learning. Then it discusses five general classes of MI approaches that cover a total of 16 MI algorithms. After that, it presents empirical results for these algorithms that were obtained from 15 datasets which involve five different real-world application domains. Finally, some conclusions are drawn from these results: (1) applying suitable standard single-instance learners to MI problems can often generate the best result on the datasets that were tested, (2) algorithms exploiting the standard asymmetric MI assumption do not show significant advantages over approaches using the so-called collective assumption, and (3) different MI approaches are suitable for different application domains, and no MI algorithm works best on all MI problems.

Lin Dong | Lin Dong

[1] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[2] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[3] A. W. Kemp,et al. Kendall's Advanced Theory of Statistics. , 1994 .

[4] Jun Wang,et al. Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[5] Peter Auer,et al. On Learning From Multi-Instance Examples: Empirical Evaluation of a Theoretical Approach , 1997, ICML.

[6] Ashwin Srinivasan,et al. Mutagenesis: ILP experiments in a non-determinate biological domain , 1994 .

[7] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8] Bernhard Pfahringer,et al. A Toolbox for Learning from Relational Data with Propositional and Multi-instance Learners , 2004, Australian Conference on Artificial Intelligence.

[9] Qi Zhang,et al. EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.

[10] V. Gladyshev,et al. A Study in Modeling Low-Conservation Protein Superfamilies , 2004 .

[11] T. Fan,et al. A structure-activity analysis of antagonism of the growth factor and angiogenic activity of basic fibroblast growth factor by suramin and related polyanions. , 1994, British Journal of Cancer.