Gene-Pair Representation and Incorporation of GO-based Semantic Similarity into Classification of Gene Expression Data

To emphasize gene interactions in the classification algorithms, a new representation is proposed, comprising gene-pairs and not single genes. Each pair is represented by L1 difference in the corresponding expression values. The novel representation is evaluated on benchmark datasets and is shown to often increase classification accuracy for genetic datasets. Exploiting the gene-pair representation and the Gene Ontology (GO), the semantic similarity of gene pairs can be incorporated to pre-select pairs with a high similarity value. The GO-based feature selection approach is compared to the plain data driven selection and is shown to often increase classification accuracy.

[1]  Tim Hubbard Finishing the euchromatic sequence of the human genome , 2004 .

[2]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[3]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[4]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[5]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[6]  Zheng Chen,et al.  Using Gene Ontology to enhance effectiveness of similarity measures for microarray data , 2010, Int. J. Data Min. Bioinform..

[7]  Olivier Bodenreider,et al.  An ontology-driven clustering method for supporting gene expression analysis , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[8]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[9]  Jian Tang,et al.  Integrating gene ontology into discriminative powers of genes for feature selection in microarray data , 2007, SAC '07.

[10]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[11]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[12]  Aharon Bar-Hillel Learning from weak representations using distance functions and generative models , 2006 .

[13]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[14]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[15]  Nassir Navab,et al.  Shape-based diagnosis of the aortic valve , 2009, Medical Imaging.

[16]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[17]  Rafal Kustra,et al.  Incorporating Gene Ontology in Clustering Gene Expression Data , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[18]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[19]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[20]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[21]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[22]  G. Karypis,et al.  Interferon-inducible gene expression signature in peripheral blood cells of patients with severe lupus , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Catia Pesquita,et al.  Evaluating GO-based Semantic Similarity Measures , 2007 .

[24]  John Quackenbush,et al.  Computational genetics: Computational analysis of microarray data , 2001, Nature Reviews Genetics.

[25]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[26]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Angel Rubio,et al.  Correlation between Gene Expression and GO Semantic Similarity , 2005, TCBB.

[28]  Alexey Tsymbal,et al.  Neighborhood graph and learning discriminative distance functions for clinical decision support , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.