Gene subset selection in microarray data using entropic filtering for cancer classification

: In this work an entropic filtering algorithm (EFA) for feature selection is described, as a workable method to generate a relevant subset of genes. This is a fast feature selection method based on finding feature subsets that jointly maximize the normalized multivariate conditional entropy with respect to the classification ability of tumours. The EFA is tested in combination with several machine learning algorithms on five public domain microarray data sets. It is found that this combination offers subsets yielding similar or much better accuracies than using the full set of genes. The solutions obtained are of comparable quality to previous results, but they are obtained in a maximum of half an hour computing time and use a very low number of genes.

[1]  Wei Chu,et al.  Biomarker discovery in microarray gene expression data with Gaussian processes , 2005, Bioinform..

[2]  Elif Derya Übeyli Measuring saliency of features extracted by model-based methods from internal carotid arterial Doppler signals using signal-to-noise ratios , 2008, Digit. Signal Process..

[3]  Shraddha S. Nigavekar,et al.  RAGE Activation by S100P in Colon Cancer Stimulates Growth, Migration, and Cell Signaling Pathways , 2007, Diseases of the colon and rectum.

[4]  A. Dilella,et al.  Identification of Genes Differentially Expressed in Benign Prostatic Hyperplasia , 2001, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[5]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, TCBB.

[6]  O. Klezovitch,et al.  Hepsin promotes prostate cancer progression and metastasis. , 2004, Cancer cell.

[7]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Feng Chu,et al.  Applications of support vector machines to cancer classification with microarray data , 2005, Int. J. Neural Syst..

[9]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  C. Piccoli,et al.  Isolation and characterization of e3B1, an eps8 binding protein that regulates cell growth , 1997, Oncogene.

[11]  Yoko Yasuda,et al.  Cloning and chromosomal mapping of the human gene of neuroglycan C (NGC), a neural transmembrane chondroitin sulfate proteoglycan with an EGF module , 1998, Neuroscience Research.

[12]  Yan Leng,et al.  Abelson Interactor Protein-1 Positively Regulates Breast Cancer Cell Proliferation, Migration, and Invasion , 2007, Molecular Cancer Research.

[13]  Weiliang Qiu,et al.  Development of a “reverse capture” autoantibody microarray for studies of antigen‐autoantibody profiling , 2006, Proteomics.

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[16]  J. Graff,et al.  eIF-4E expression and its role in malignancies and metastases , 2004, Oncogene.

[17]  D J Wolgemuth,et al.  Fetal development of the enteric nervous system of transgenic mice that overexpress the Hoxa‐4 gene , 1998, Developmental dynamics : an official publication of the American Association of Anatomists.

[18]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[19]  G. Bontempi,et al.  A Blocking Strategy to Improve Gene Selection for Classification of Gene Expression Data , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[22]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[23]  S. P. Nana-Sinkam,et al.  Gene microarray analysis of peripheral blood cells in pulmonary arterial hypertension. , 2004, American journal of respiratory and critical care medicine.

[24]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[25]  F. Azuaje,et al.  Multiple SVM-RFE for gene selection in cancer classification with expression data , 2005, IEEE Transactions on NanoBioscience.

[26]  Satoru Kuhara,et al.  Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE , 2006, BMC Bioinformatics.

[27]  A. Valencia,et al.  A gene network for navigating the literature , 2004, Nature Genetics.