Ensemble Feature Ranking Applied to Medical Data

Abstract Reduce the feature space in classification is a critical, although sensitive, task since it depends on a certain definition of relevance. Feature selection has been the motivation for many researchers. In medical datasets, relevant attributes are often unknown a priori. Feature selection provides the features that contribute most to the classification task per si, which should therefore be used by any classifier to produce a classification model. However, the dimension of the feature space may not allow the application of feature selection algorithms, due time and space complexity. In this work, we are concerned on the application of an efficient feature ranking algorithm for a given breast cancer dataset, that overcome the dimensionality of the data.

[1]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[2]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[3]  David C. Yen,et al.  Data mining techniques for customer relationship management , 2002 .

[4]  Jan Komorowski,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm486 Data and text mining Monte Carlo , 2022 .

[5]  Isaac N. Bankman,et al.  Handbook of medical image processing and analysis , 2009 .

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[8]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[9]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[10]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[11]  Satoru Miyano,et al.  Strategy of finding optimal number of features on gene expression data , 2011 .

[12]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[13]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[14]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[15]  Mehmet Fatih Akay,et al.  Support vector machines combined with feature selection for breast cancer diagnosis , 2009, Expert Syst. Appl..

[16]  Vitor Santos,et al.  Classification performance of data mining algorithms applied to breast cancer data , 2013 .

[17]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[18]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[19]  Michael J. A. Berry,et al.  Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management , 2004 .

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[22]  Edward Y. Chang,et al.  Support vector machine active learning for image retrieval , 2001, MULTIMEDIA '01.

[23]  Robert L. Grossman,et al.  Data Mining for Scientific and Engineering Applications , 2001, Massive Computing.

[24]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[25]  Tatsunori Mori,et al.  Information Gain Ratio as Term Weight: The case of Summarization of IR Results , 2002, COLING.

[26]  Tony Jebara,et al.  Structure preserving embedding , 2009, ICML '09.

[27]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[28]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[29]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[30]  Michael J. A. Berry,et al.  Mastering Data Mining: The Art and Science of Customer Relationship Management , 1999 .

[31]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[32]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[33]  Pat Langley,et al.  Scaling to domains with irrelevant features , 1997, COLT 1997.

[34]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[35]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[36]  AkayMehmet Fatih Support vector machines combined with feature selection for breast cancer diagnosis , 2009 .

[37]  Chih-Ping Wei,et al.  Feature Selection for Medical Data Mining: Comparisons of Expert Judgment and Automatic Approaches , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[38]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.