Cross-species and cross-platform gene expression studies with the Bioconductor-compliant R package 'annotationTools'

BackgroundThe variety of DNA microarray formats and datasets presently available offers an unprecedented opportunity to perform insightful comparisons of heterogeneous data. Cross-species studies, in particular, have the power of identifying conserved, functionally important molecular processes. Validation of discoveries can now often be performed in readily available public data which frequently requires cross-platform studies.Cross-platform and cross-species analyses require matching probes on different microarray formats. This can be achieved using the information in microarray annotations and additional molecular biology databases, such as orthology databases. Although annotations and other biological information are stored using modern database models (e.g. relational), they are very often distributed and shared as tables in text files, i.e. flat file databases. This common flat database format thus provides a simple and robust solution to flexibly integrate various sources of information and a basis for the combined analysis of heterogeneous gene expression profiles.ResultsWe provide annotationTools, a Bioconductor-compliant R package to annotate microarray experiments and integrate heterogeneous gene expression profiles using annotation and other molecular biology information available as flat file databases. First, annotationTools contains a specialized set of functions for mining this widely used database format in a systematic manner. It thus offers a straightforward solution for annotating microarray experiments. Second, building on these basic functions and relying on the combination of information from several databases, it provides tools to easily perform cross-species analyses of gene expression data.Here, we present two example applications of annotationTools that are of direct relevance for the analysis of heterogeneous gene expression profiles, namely a cross-platform mapping of probes and a cross-species mapping of orthologous probes using different orthology databases. We also show how to perform an explorative comparison of disease-related transcriptional changes in human patients and in a genetic mouse model.ConclusionThe R package annotationTools provides a simple solution to handle microarray annotation and orthology tables, as well as other flat molecular biology databases. Thereby, it allows easy integration and analysis of heterogeneous microarray experiments across different technological platforms or species.

[1]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[2]  Cornelia I Bargmann,et al.  Comparing genomic expression patterns across species identifies shared transcriptional profile in aging , 2004, Nature Genetics.

[3]  Zhen Jiang,et al.  Bioconductor Project Bioconductor Project Working Papers Year Paper Extensions to Gene Set Enrichment , 2013 .

[4]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[5]  Raphael A Nemenoff,et al.  Tumorigenesis and Neoplastic Progression Analysis of Orthologous Gene Expression between Human Pulmonary Adenocarcinoma and a Carcinogen-Induced Murine Model , 2010 .

[6]  Mauro Delorenzi,et al.  Mutant huntingtin's effects on striatal gene expression in mice recapitulate changes observed in human Huntington's disease brain and do not differ with mutant huntingtin length or wild-type huntingtin dosage. , 2007, Human molecular genetics.

[7]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[8]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[9]  Jussi Paananen,et al.  CROPPER: a metagene creator resource for cross-platform and cross-species compendium studies , 2006, BMC Bioinformatics.

[10]  S. Enkemann,et al.  A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array , 2005, Nucleic acids research.

[11]  C A Ross,et al.  Decreased expression of striatal signaling genes in a mouse model of Huntington's disease. , 2000, Human molecular genetics.

[12]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[13]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[14]  Weida Tong,et al.  Bioinformatics approaches for cross-species liver cancer analysis based on microarray gene expression profiling , 2005, BMC Bioinformatics.

[15]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[16]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[17]  Sarah J Tabrizi,et al.  Gene expression in Huntington's disease skeletal muscle: a potential biomarker. , 2005, Human molecular genetics.

[18]  J. Olson,et al.  Regional and cellular gene expression changes in human Huntington's disease brain. , 2006, Human molecular genetics.

[19]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[20]  G. Pertea,et al.  RESOURCERER: a database for annotating and linking microarray resources within and across species , 2001, Genome Biology.

[21]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[23]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[24]  John Quackenbush,et al.  Orthologous gene-expression profiling in multi-species models: search for candidate genes , 2004, Genome Biology.

[25]  Andreas Prlic,et al.  Ensembl 2007 , 2006, Nucleic Acids Res..

[26]  R. Albin,et al.  Neurological abnormalities in a knock-in mouse model of Huntington's disease. , 2001, Human molecular genetics.

[27]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2006, Nucleic Acids Res..