Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis

Microarray analysis is widely accepted for human cancer diagnosis and classification. However the high dimensionality of microarray data poses a great challenge to classification. Gene selection plays a key role in identifying salient genes from thousands of genes in microarray data that can directly contribute to the symptom of disease. Although various excellent selection methods are currently available, one common problem of these methods is that genes which have strong discriminatory power as a group but are weak as individuals will be discarded. In this paper, a new gene selection method is proposed for cancer diagnosis and classification by retaining useful intrinsic groups of interdependent genes. The primary characteristic of this method is that the relevance between each gene and target will be dynamically updated when a new gene is selected. The effectiveness of our method is validated by experiments on six publicly available microarray data sets. Experimental results show that the classification performance and enrichment score achieved by our proposed method is better than those of other selection methods.

[1]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[3]  Shuichi Tsutsumi,et al.  Global gene expression analysis of gastric cancer by oligonucleotide microarrays. , 2002, Cancer research.

[4]  J.C. Rajapakse,et al.  SVM-RFE With MRMR Filter for Gene Selection , 2010, IEEE Transactions on NanoBioscience.

[5]  Xinkun Wang,et al.  An effective structure learning method for constructing gene networks , 2006, Bioinform..

[6]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[7]  C. Cooper,et al.  Applications of microarray technology in breast cancer research , 2001, Breast Cancer Research.

[8]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  Yihong Gong,et al.  Feature Selection for Gene Expression Using Model-Based Entropy , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Jeffery P. Demuth,et al.  The Evolution of Mammalian Gene Families , 2006, PloS one.

[12]  Constantin F. Aliferis,et al.  GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data , 2005, Int. J. Medical Informatics.

[13]  Michael W. Kattan,et al.  An empirical approach to model selection through validation for censored survival data , 2011, J. Biomed. Informatics.

[14]  Jin Li,et al.  Feature evaluation and selection with cooperative game theory , 2012, Pattern Recognit..

[15]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[16]  Xiaoxing Liu,et al.  An Entropy-based gene selection method for cancer classification using microarray data , 2005, BMC Bioinformatics.

[17]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[18]  C. Ding,et al.  Gene selection algorithm by combining reliefF and mRMR , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[19]  B. Chandra,et al.  An efficient statistical feature selection approach for classification of gene expression data , 2011, J. Biomed. Informatics.

[20]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[21]  Gonçalo R. Abecasis,et al.  Functional Gene Group Analysis Reveals a Role of Synaptic Heterotrimeric G Proteins in Cognitive Ability , 2010, American journal of human genetics.

[22]  Attila Gyenesei,et al.  Mining co-regulated gene profiles for the detection of functional associations in gene expression data , 2007, Bioinform..

[23]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[24]  M. Capecchi,et al.  Paralogous mouse Hox genes, Hoxa9, Hoxb9, and Hoxd9, function together to control development of the mammary gland in response to pregnancy. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[26]  J. Downing,et al.  Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells , 2003, Nature Genetics.

[27]  Carlos J. Alonso,et al.  Selecting Few Genes for Microarray Gene Expression Classification , 2009, CAEPIA.

[28]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[29]  J. Downing,et al.  Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells , 2003, Nature Genetics.

[30]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[31]  Colas Schretter,et al.  Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity , 2008, IEEE Journal of Selected Topics in Signal Processing.

[32]  Wei Kong,et al.  New gene selection method for multiclass tumor classification by class centroid , 2009, J. Biomed. Informatics.

[33]  Hsinchun Chen,et al.  Optimal Search-Based Gene Subset Selection for Gene Array Cancer Classification , 2007, IEEE Transactions on Information Technology in Biomedicine.

[34]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[35]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[37]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[38]  Richard Baumgartner,et al.  Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions , 2003, Bioinform..

[39]  Taeho Hwang,et al.  FiGS: a filter-based gene selection workbench for microarray data , 2010, BMC Bioinformatics.

[40]  Antonio Reverter,et al.  Combining partial correlation and an information theory approach to the reversed engineering of gene co-expression networks , 2008, Bioinform..

[41]  Yonghong Peng,et al.  A novel feature selection approach for biomedical data classification , 2010, J. Biomed. Informatics.

[42]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[43]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[44]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[45]  Gary Geunbae Lee,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006, Inf. Process. Manag..

[46]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[47]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[48]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.