The Set Classification Problem and Solution Methods

This paper focuses on developing classification algorithms for problems in which there is a need to predict the class based on multiple observations (examples) of the same phenomenon (class). These problems give rise to a new classification problem, referred to as set classification, that requires the prediction of a set of instances given the prior knowledge that all the instances of the set belong to the same unknown class. This problem falls under the general class of problems whose instances have class label dependencies. Four methods for solving the set classification problem are developed and studied. The first is based on a straightforward extension of the traditional classification paradigm whereas the other three are designed to explicitly take into account the known dependencies among the instances of the unlabeled set during learning or classification. A comprehensive experimental evaluation of the various methods and their underlying parameters shows that some of them lead to significant gains in performance.

[1]  Zhi-Hua Zhou,et al.  Solving multi-instance problems with classifier ensemble based on constructive clustering , 2007, Knowledge and Information Systems.

[2]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  Zhi-Hua Zhou,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2006, NIPS.

[4]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[5]  A. Bender,et al.  In silico target fishing: Predicting biological targets from chemical structure , 2006 .

[6]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[7]  Roberto Cipolla,et al.  Face Set Classification using Maximally Probable Mutual Modes , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[8]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[9]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[10]  G. Karypis,et al.  Frequent sub-structure-based approaches for classifying chemical compounds , 2005, Third IEEE International Conference on Data Mining.

[11]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[12]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[13]  Jennifer Neville,et al.  Why collective inference improves relational classification , 2004, KDD.

[14]  George Karypis,et al.  Frequent substructure-based approaches for classifying chemical compounds , 2003, IEEE Transactions on Knowledge and Data Engineering.

[15]  Constance de Koning,et al.  Editors , 2003, Annals of Emergency Medicine.

[16]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[17]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[18]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[19]  Trevor Darrell,et al.  Face Recognition from Long-Term Observations , 2002, ECCV.

[20]  Tomaso A. Poggio,et al.  Face recognition with support vector machines: global versus component-based approach , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[21]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[22]  B. Roth,et al.  The Multiplicity of Serotonin Receptors: Uselessly Diverse Molecules or an Embarrassment of Riches? , 2000 .

[23]  Jun Wang,et al.  Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[24]  Vicki Bruce,et al.  Face Recognition: From Theory to Applications , 1999 .

[25]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[26]  D. B. Graham,et al.  Characterising Virtual Eigensignatures for General Purpose Face Recognition , 1998 .

[27]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[28]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[29]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[30]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[31]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[32]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[33]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[34]  John A. Tallarico,et al.  Integrating high-content screening and ligand-target prediction to identify mechanism of action. , 2008, Nature chemical biology.

[35]  Anthony Nichols,et al.  High content screening as a screening tool in drug discovery. , 2007, Methods in molecular biology.

[36]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[37]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[38]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .