Integrating data from heterogeneous DNA microarray platforms

DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.

[1]  Petri Auvinen,et al.  Are data from different gene expression microarray platforms comparable? , 2004, Genomics.

[2]  Abhijit Waman Phatak,et al.  Effect of single nucleotide polymorphisms on Affymetrix® match-mismatch probe pairs , 2008, Bioinformation.

[3]  S. Nelson,et al.  DNA-microarray analysis of brain cancer: molecular classification for therapy , 2004, Nature Reviews Neuroscience.

[4]  Igor Goryanin,et al.  Journal of Integrative Bioinformatics , 2015 .

[5]  Joanna Polanska,et al.  Integrating Expression Data from Different Microarray Platforms in Search of Biomarkers of Radiosensitivity , 2014, IWBBIO.

[6]  Aedín C. Culhane,et al.  A multivariate approach to the integration of multi-omics datasets , 2014, BMC Bioinformatics.

[7]  Stat Pairs,et al.  Statistical Algorithms Description Document Genechip ® Array Design Data Outputs Stat Pairs Used , 2022 .

[8]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[9]  Giovanni Parmigiani,et al.  Pre-processing Agilent microarray data , 2007, BMC Bioinformatics.

[10]  U. Naumann,et al.  Microarray Analysis in a Cell Death Resistant Glioma Cell Line to Identify Signaling Pathways and Novel Genes Controlling Resistance and Malignancy , 2011, Cancers.

[11]  Natalia Shulzhenko,et al.  Microarrays for cancer diagnosis and classification. , 2007, Advances in experimental medicine and biology.

[12]  Andrei Zinovyev,et al.  Principal Manifolds for Data Visualization and Dimension Reduction , 2007 .

[13]  G. Churchill,et al.  A comparison of cDNA, oligonucleotide, and Affymetrix GeneChip gene expression microarray platforms. , 2004, Journal of biomolecular techniques : JBT.

[14]  Guy Perrière,et al.  Cross-platform comparison and visualisation of gene expression data using co-inertia analysis , 2003, BMC Bioinformatics.

[15]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[16]  R. Pestell,et al.  Abstract 352: Integrating transcriptomic data using metacore pathway analysis to identify novel biomarkers of bevacizumab target engagement , 2014 .

[17]  Hong Fang,et al.  Decision forest for classification of gene expression data , 2010, Comput. Biol. Medicine.

[18]  Heng Li,et al.  Mapping the human reference genome's missing sequence by three-way admixture in Latino genomes. , 2013, American journal of human genetics.

[19]  Miguel Rocha,et al.  Transcript-based reannotation for microarray probesets , 2015, SAC.

[20]  J. Do,et al.  Normalization of microarray data: single-labeled and dual-labeled arrays. , 2006, Molecules and cells.

[21]  Dov Stekel,et al.  Microarray Bioinformatics: Appendix: MIAME Glossary , 2003 .

[22]  E. Kostadinova DATA INTEGRATION : AN APPROACH TO IMPROVE THE PREPROCESSING AND ANALYSIS OF GENE EXPRESSION DATA , 2014 .

[23]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[24]  Pierre-Antoine Absil,et al.  Principal Manifolds for Data Visualization and Dimension Reduction , 2007 .

[25]  Michel Bellis,et al.  Mapping of Affymetrix probe sets to groups of transcripts using transcriptional networks , 2012, 1201.2033.

[26]  Doron Lancet,et al.  Novel definition files for human GeneChips based on GeneAnnot , 2007, BMC Bioinformatics.

[27]  Sambasivarao Damaraju,et al.  Effects of Sample Size on Differential Gene Expression, Rank Order and Prediction Accuracy of a Gene Signature , 2013, PloS one.

[28]  Hui Yu,et al.  Transcript-level annotation of Affymetrix probesets improves the interpretation of gene expression data , 2007, BMC Bioinformatics.

[29]  Ana Carolina Lorena,et al.  On the Complexity of Gene Expression Classification Data Sets , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[30]  Sung-Bae Cho,et al.  Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features , 2002, Proc. IEEE.

[31]  P. Flicek,et al.  Consistent annotation of gene expression arrays , 2010, BMC Genomics.