Literature-based Evaluation of Microarray Normalization Procedures

Normalization procedures attempt to remove non-biological variance found within micro array datasets. The choice of normalization procedure is important, as it has a dramatic effect on downstream data analysis. Although many normalization procedures have been developed, comparison and evaluation of their performance is difficult. We present a method to evaluate normalization procedures by utilizing gene-gene associations derived from the biomedical literature via Latent Semantic Indexing. The functional coherence of co-regulated genes obtained from different normalized data sets is calculated in order to evaluate the effectiveness of each normalization procedure. The method was tested on three popular normalization procedures (MAS5, PDNN and RMA) applied to gene expression across 71 recombinant inbred mouse brain samples. Results show that, on average, MAS5 outperforms both PDNN and RMA by producing a higher number of functionally cohesive gene sets. These results demonstrate that our literature-based cohesion analysis can provide an objective method for evaluation of normalization procedures.

[1]  Mark Pollitt,et al.  Exploration , 2006, J. Digit. Forensic Pract..

[2]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[3]  Lance D. Miller,et al.  Correlation test to assess low-level processing of high-density oligonucleotide microarray data , 2005, BMC Bioinformatics.

[4]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[5]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[6]  John Okyere,et al.  How to decide? Different methods of calculating gene expression from short oligonucleotide array data will give different results , 2006, BMC Bioinformatics.

[7]  Michael W. Berry,et al.  GTP (General Text Parser) Software for Text Mining , 2003 .

[8]  Csaba Legány,et al.  Cluster validity measurement techniques , 2006 .

[9]  João Ricardo Sato,et al.  Evaluating different methods of microarray data normalization , 2006, BMC Bioinformatics.

[10]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[11]  Michael W. Berry,et al.  Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts , 2011, PloS one.

[12]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Kai Wang,et al.  Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks , 2007, ISMB/ECCB.

[14]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[15]  Björn Usadel,et al.  Algorithm-driven Artifacts in median polish summarization of Microarray data , 2010, BMC Bioinformatics.

[16]  Satoru Miyano,et al.  Open source clustering software , 2004 .

[17]  Kathleen F. Kerr,et al.  Evaluation of methods for oligonucleotide array data via quantitative real-time PCR , 2006, BMC Bioinformatics.

[18]  Michael W. Berry,et al.  Gene clustering by Latent Semantic Indexing of MEDLINE abstracts , 2005, Bioinform..

[19]  Paul Pavlidis,et al.  Gene Ontology term overlap as a measure of gene functional similarity , 2008, BMC Bioinformatics.

[20]  Fred H. Gage,et al.  Genetics of the Hippocampal Transcriptome in Mouse: A Systematic Survey and Online Neurogenomics Resource , 2009, Front. Neurogen..

[21]  Crispin J. Miller,et al.  Cell Culture , 2010, Cell.