Validation of Merging Techniques for Cancer Microarray Data Sets

There is a vast amount of gene expression data that has been gathered in microarray studies all over the world. Many of these studies use different experimentation plans, different platforms, different methodologies, etc. Merging information of different studies is an important part of current research in bio-informatics and several algorithms have been proposed recently. There is a need to create large data sets which will allow more statistically relevant analysis. In this article we concisely describe several data merging techniques and apply them on cancer microarray data sets. We study three cases of increasing complexity and test all methods by using a number of popular validation criteria. Furthermore, we test the compatibility of the transformed data sets by performing cross-study classification.

[1]  E. Levanon,et al.  Human housekeeping genes are compact. , 2003, Trends in genetics : TIG.

[2]  Joel S. Parker,et al.  Adjustment of systematic microarray data biases , 2004, Bioinform..

[3]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[4]  Jaakko Astola,et al.  Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations , 2009, BMC Bioinformatics.

[5]  Crispin J. Miller,et al.  The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis , 2008, BMC Medical Genomics.

[6]  Gavin Sherlock,et al.  The Stanford Microarray Database: implementation of new analysis tools and open source release of software , 2002, Nucleic Acids Res..

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  Ibrahim Emam,et al.  ArrayExpress update—from an archive of functional genomics experiments to the atlas of gene expression , 2008, Nucleic Acids Res..

[9]  Andrew B. Nobel,et al.  Merging two gene-expression studies via cross-platform normalization , 2008, Bioinform..

[10]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[11]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.