CrossNorm: a novel normalization strategy for microarray data in cancers

Normalization is essential to get rid of biases in microarray data for their accurate analysis. Existing normalization methods for microarray gene expression data commonly assume a similar global expression pattern among samples being studied. However, scenarios of global shifts in gene expressions are dominant in cancers, making the assumption invalid. To alleviate the problem, here we propose and develop a novel normalization strategy, Cross Normalization (CrossNorm), for microarray data with unbalanced transcript levels among samples. Conventional procedures, such as RMA and LOESS, arbitrarily flatten the difference between case and control groups leading to biased gene expression estimates. Noticeably, applying these methods under the strategy of CrossNorm, which makes use of the overall statistics of the original signals, the results showed significantly improved robustness and accuracy in estimating transcript level dynamics for a series of publicly available datasets, including titration experiment, simulated data, spike-in data and several real-life microarray datasets across various types of cancers. The results have important implications for the past and the future cancer studies based on microarray samples with non-negligible difference. Moreover, the strategy can also be applied to other sorts of high-throughput data as long as the experiments have global expression variations between conditions.

[1]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[2]  Charles Y. Lin,et al.  Transcriptional Amplification in Tumor Cells with Elevated c-Myc , 2012, Cell.

[3]  Krishna R. Kalari,et al.  FKBP51 affects cancer cell response to chemotherapy by negatively regulating Akt. , 2009, Cancer cell.

[4]  X. Chen,et al.  Global gene expression distribution in non-cancerous complex diseases. , 2014, Molecular bioSystems.

[5]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[6]  Martin Posch,et al.  Cross-platform comparison of microarray data using order restricted inference , 2011, Bioinform..

[7]  G. Collins The next generation. , 2006, Scientific American.

[8]  Ke Chen,et al.  Survey of MapReduce frame operation in bioinformatics , 2013, Briefings Bioinform..

[9]  M. Dugas,et al.  Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis , 2002, Genome Biology.

[10]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[11]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[12]  Chi Zhang,et al.  TiSGeD: a database for tissue-specific genes , 2010, Bioinform..

[13]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[14]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[15]  B. Liu,et al.  Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach , 2015, PloS one.

[16]  John Quackenbush Microarray analysis and tumor classification. , 2006, The New England journal of medicine.

[17]  Maqc Consortium The MicroArray Quality Control ( MAQC )-II study of common practices for the development and validation of microarray-based predictive models , 2012 .

[18]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[19]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[20]  Yunyan Gu,et al.  Extensive up-regulation of gene expression in cancer: the normalised use of microarray data. , 2012, Molecular bioSystems.

[21]  Nan Hu,et al.  Genome wide analysis of DNA copy number neutral loss of heterozygosity (CNNLOH) and its relation to gene expression in esophageal squamous cell carcinoma , 2010, BMC Genomics.

[22]  Nan Hu,et al.  Global Gene Expression Profiling and Validation in Esophageal Squamous Cell Carcinoma and Its Association with Clinical Phenotypes , 2011, Clinical Cancer Research.

[23]  Yudi Pawitan,et al.  Normalization of oligonucleotide arrays based on the least-variant set of genes , 2008, BMC Bioinformatics.

[24]  Shyr Yu,et al.  Use of normalization methods for analysis of microarrays containing a high degree of gene effects , 2008, BMC Bioinformatics.

[25]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[26]  A. Yakovlev,et al.  How high is the level of technical noise in microarray data? , 2007, Biology Direct.

[27]  B. Li,et al.  Deciphering global signal features of high-throughput array data from cancers. , 2014, Molecular bioSystems.

[28]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[29]  Qiang Sun,et al.  Individual-level analysis of differential expression of genes and pathways for personalized medicine , 2015, Bioinform..

[30]  Bin Liu,et al.  QChIPat: a quantitative method to identify distinct binding patterns for two biological ChIP-seq samples in different experimental conditions , 2013, BMC Genomics.

[31]  Liviu Badea,et al.  Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia. , 2008, Hepato-gastroenterology.

[32]  Hao Wang,et al.  PaGeFinder: quantitative identification of spatiotemporal pattern genes , 2012, Bioinform..

[33]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.