GELA: A Software Tool for the Analysis of Gene Expression Data

Leveraging advances in transcriptome profiling technologies (RNA-seq), biomedical scientists are collecting ever-increasing gene expression profiles data with low cost and high throughput. Therefore, automatic knowledge extraction methods are becoming essential to manage them. In this work, we present GELA (Gene Expression Logic Analyzer), a novel pipeline able to perform a knowledge discovery process in gene expression profiles data of RNA-seq. Firstly, we introduce the RNA-seq technologies, then, we illustrate our gene expression profiles data analysis method (including normalization, clustering, and classification), and finally, we test our knowledge extraction algorithm on the public RNA-seq data sets of Breast Cancer and Stomach Cancer, and on the public microarray data sets of Psoriasis and Multiple Sclerosis, obtaining in both cases promising results.

[1]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[2]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[3]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[4]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[5]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[6]  Klaus Truemper,et al.  A MINSAT Approach for Learning in Logic Domains , 2002, INFORMS J. Comput..

[7]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[8]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[9]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[10]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[11]  Eleanor Howe,et al.  RNA-Seq analysis in MeV , 2011, Bioinform..

[12]  Brian R. Gaines,et al.  Induction of ripple-down rules applied to modeling large databases , 1995, Journal of Intelligent Information Systems.

[13]  P. Bertolazzi,et al.  Gene expression biomarkers in the brain of a mouse model for Alzheimer's disease: mining of microarray data by logic classification and feature selection. , 2011, Journal of Alzheimer's disease : JAD.

[14]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[15]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[16]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[17]  Giovanni Felici,et al.  MALA: A Microarray Clustering and Classification Software , 2012, 2012 23rd International Workshop on Database and Expert Systems Applications.

[18]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[19]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[20]  Béchir el Ayeb,et al.  Mining microarray gene expression data with unsupervised possibilistic clustering and proximity graphs , 2009, Applied Intelligence.

[21]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[22]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[23]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[24]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.