ReliefF-MI: An extension of ReliefF to multiple instance learning

In machine learning the so-called curse of dimensionality, pertinent to many classification algorithms, denotes the drastic increase in computational complexity and classification error with data having a great number of dimensions. In this context, feature selection techniques try to reduce dimensionality finding a new more compact representation of instances selecting the most informative features and removing redundant, irrelevant, and/or noisy features. In this paper, we propose a filter-based feature selection method for working in the multiple-instance learning scenario called ReliefF-MI; it is based on the principles of the well-known ReliefF algorithm. Different extensions are designed and implemented and their performance checked in multiple instance learning. ReliefF-MI is applied as a pre-processing step that is completely independent from the multi-instance classifier learning process and therefore is more efficient and generic than wrapper approaches proposed in this area. Experimental results on five benchmark real-world data sets and 17 classification algorithms confirm the utility and efficiency of this method, both statistically and from the point of view of execution time.

[1]  Hiroaki Kitano,et al.  Proceedings of the 21st international jont conference on Artifical intelligence , 2009 .

[2]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[3]  Sebastián Ventura,et al.  Multiple instance learning for classifying students in learning management systems , 2011, Expert Syst. Appl..

[4]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[5]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[6]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[7]  G. A. Edgar Measure, Topology, and Fractal Geometry , 1990 .

[8]  Harvey A. Cohen Image restoration via N-nearest neighbour classification , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[9]  Yann Chevaleyre,et al.  Solving Multiple-Instance and Multiple-Part Learning Problems with Decision Trees and Rule Sets. Application to the Mutagenesis Problem , 2001, Canadian Conference on AI.

[10]  Emilio Corchado,et al.  Hybrid intelligent algorithms and applications , 2010, Inf. Sci..

[11]  Peter Auer,et al.  A Boosting Approach to Multiple Instance Learning , 2004, ECML.

[12]  Xin Xu,et al.  Statistical Learning in Multiple Instance Problems , 2003 .

[13]  Meng Wang,et al.  A Novel Multiple Instance Learning Approach for Image Retrieval Based on Adaboost Feature Selection , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[14]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[15]  Stuart Harvey Rubin,et al.  A Human-Centered Multiple Instance Learning Framework for Semantic Video Retrieval , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Xiaojun Qi,et al.  Incorporating multiple SVMs for automatic image annotation , 2007, Pattern Recognit..

[17]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[18]  Jun Wang,et al.  Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[19]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[20]  Zhi-Hua Zhou,et al.  Improve Multi-Instance Neural Networks through Feature Selection , 2004, Neural Processing Letters.

[21]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[22]  Murat Dundar,et al.  Bayesian multiple instance learning: automatic feature selection and inductive transfer , 2008, ICML '08.

[23]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[24]  Luo Si,et al.  M3IC: Maximum Margin Multiple Instance Clustering , 2009, IJCAI.

[25]  Sheng Gao,et al.  Exploiting generalized discriminative multiple instance learning for multimedia semantic concept detection , 2008, Pattern Recognit..

[26]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[27]  Qingming Huang,et al.  Multiple Instance Boost Using Graph Embedding Based Decision Stump for Pedestrian Detection , 2008, ECCV.

[28]  Enrique Herrera-Viedma,et al.  Multi-instance genetic programming for web index recommendation , 2009, Expert Syst. Appl..

[29]  Yann Chevaleyre,et al.  Solving multiple-instance and multiple-part learning problems with decision trees and decision rules . Application to the mutagenesis problem , 2000 .

[30]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[32]  Tao Mei,et al.  MILC2: A Multi-Layer Multi-Instance Learning Approach to Video Concept Detection , 2008, MMM.

[33]  Jie Xu,et al.  Region-based image categorization with reduced feature set , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[34]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[35]  Zhi-Hua Zhou,et al.  Multi-Instance Learning Based Web Mining , 2005, Applied Intelligence.

[36]  Sebastián Ventura,et al.  G3P-MI: A genetic programming algorithm for multiple instance learning , 2010, Inf. Sci..

[37]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[38]  Zhi-Hua Zhou,et al.  Multi-instance clustering with applications to multi-instance prediction , 2009, Applied Intelligence.

[39]  Juan M. Corchado,et al.  Hybrid learning machines , 2009, Neurocomputing.

[40]  Farshad Fotouhi,et al.  Region based image annotation through multiple-instance learning , 2005, MULTIMEDIA '05.

[41]  Ian Witten,et al.  Data Mining , 2000 .

[42]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[43]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[44]  Qi Zhang,et al.  EM-DD: An Improved Multiple-Instance Learning Technique , 2001, NIPS.