ML-KNN: A lazy learning approach to multi-label learning

Multi-label learning originated from the investigation of text categorization problem, where each document may belong to several predefined topics simultaneously. In multi-label learning, the training set is composed of instances each associated with a set of labels, and the task is to predict the label sets of unseen instances through analyzing training instances with known label sets. In this paper, a multi-label lazy learning approach named ML-KNN is presented, which is derived from the traditional K-nearest neighbor (KNN) algorithm. In detail, for each unseen instance, its K nearest neighbors in the training set are firstly identified. After that, based on statistical information gained from the label sets of these neighboring instances, i.e. the number of neighboring instances belonging to each possible class, maximum a posteriori (MAP) principle is utilized to determine the label set for the unseen instance. Experiments on three different real-world multi-label learning problems, i.e. Yeast gene functional analysis, natural scene classification and automatic web page categorization, show that ML-KNN achieves superior performance to some well-established multi-label learning algorithms.

[1]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[2]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Chin-Hui Lee,et al.  A MFoM learning approach to robust multiclass multi-label text categorization , 2004, ICML.

[5]  Thomas Lengauer,et al.  Proceedings of the Fifth Annual International Conference on Computational Biology, RECOMB 2001, Montréal, Québec, Canada, April 22-25, 2001 , 2001, Annual International Conference on Research in Computational Molecular Biology.

[6]  David W. Aha,et al.  Special Issue on Lazy Learning , 1997 .

[7]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[8]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[11]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[12]  Tat-Seng Chua,et al.  A maximal figure-of-merit learning approach to text categorization , 2003, SIGIR.

[13]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[14]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[15]  Rémi Gilleron,et al.  Learning Multi-label Alternating Decision Trees from Texts and Data , 2003, MLDM.

[16]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[17]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[18]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[19]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[20]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[21]  W. Grundy,et al.  Combining Microarray Expression Data and Phylogenetic Profiles to Learn Gene Functional Categories Using Support Vector Machines , 2000, RECOMB 2000.

[22]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[23]  Eisaku Maeda,et al.  Maximal Margin Labeling for Multi-Topic Text Categorization , 2004, NIPS.

[24]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[25]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.