A graph-based representation of Gene Expression profiles in DNA microarrays

This paper proposes a new and very flexible data model, called gene expression graph (GEG), for genes expression analysis and classification. Three features differentiate GEGs from other available microarray data representation structures: (i) the memory occupation of a GEG is independent of the number of samples used to built it; (ii) a GEG more clearly expresses relationships among expressed and non expressed genes in both healthy and diseased tissues experiments; (iii) GEGs allow to easily implement very efficient classifiers. The paper also presents a simple classifier for sample-based classification to show the flexibility and user-friendliness of the proposed data structure.

[1]  D. Slonim,et al.  Evaluation of normalization procedures for oligonucleotide array data based on spiked cRNA controls , 2001, Genome Biology.

[2]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[4]  Chenn-Jung Huang,et al.  Application of Probabilistic Neural Networks to the Class Prediction of Leukemia and Embryonal Tumor of Central Nervous System , 2004, Neural Processing Letters.

[5]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[6]  Stephen J. Roberts,et al.  Probabilistic Modeling in Bioinformatics and Medical Informatics , 2010 .

[7]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[8]  D. Higgins,et al.  Bioinformatics : sequence, structure, and databanks , 2000 .

[9]  Stephen T. C. Wong,et al.  Cancer classification and prediction using logistic regression with Bayesian gene selection , 2004, J. Biomed. Informatics.

[10]  William Perrizo,et al.  Comprehensive vertical sample-based KNN/LSVM classification for gene expression analysis , 2004, J. Biomed. Informatics.

[11]  N. Camp,et al.  Classification tree analysis: a statistical tool to investigate risk factor interactions with an example for colon cancer (United States) , 2002, Cancer Causes & Control.

[12]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[13]  Francisco Azuaje,et al.  A computational neural approach to support the discovery of gene function and classes of cancer , 2001, IEEE Transactions on Biomedical Engineering.

[14]  Edward R. Dougherty,et al.  The fundamental role of pattern recognition for gene-expression/microarray data in bioinformatics , 2005, Pattern Recognit..

[15]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[16]  E. Wolski,et al.  Normalization strategies for cDNA microarrays. , 2000, Nucleic acids research.

[17]  Ron Shamir,et al.  Artificial Intelligence and Heuristic Methods in Bioinformatics , 2003 .

[18]  G. Gibson,et al.  Microarray Analysis , 2020, Definitions.

[19]  Lucila Ohno-Machado,et al.  An Epicurean learning approach to gene-expression data classification , 2003, Artif. Intell. Medicine.

[20]  Pedro Larrañaga,et al.  GUEST EDITORIAL: Data mining in genomics and proteomics , 2004 .

[21]  Volker Roth,et al.  Bayesian class discovery in microarray datasets , 2004, IEEE Transactions on Biomedical Engineering.

[22]  Michael Q. Zhang,et al.  Current Topics in Computational Molecular Biology , 2002 .

[23]  Lakhmi C. Jain,et al.  Bioinformatics using computational intelligence paradigms , 2005 .

[24]  Sumeet Dua,et al.  Data Mining in Bioinformatics , 2012, Encyclopedia of Database Systems.

[25]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[26]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[27]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[28]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[29]  Heping Zhang,et al.  Cell and tumor classification using gene expression data: Construction of forests , 2003, Proceedings of the National Academy of Sciences of the United States of America.