Volcano Plots in Analyzing differential Expressions with mRNA microarrays

A volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log(10)(p-value) from the t-test). We review the basic and interactive use of the volcano plot and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide a unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility of applying volcano plots to other fields beyond microarray.

[1]  L. Ein-Dor,et al.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Joel S. Parker,et al.  Adjustment of systematic microarray data biases , 2004, Bioinform..

[3]  Hao Wu,et al.  MAANOVA: A Software Package for the Analysis of Spotted cDNA Microarray Experiments , 2003 .

[4]  Eytan Domany,et al.  Intensity dependent estimation of noise in microarrays improves detection of differentially expressed genes , 2010, BMC Bioinformatics.

[5]  X. Cui,et al.  Improved statistical tests for differential gene expression by shrinking variance components estimates. , 2005, Biostatistics.

[6]  K. Aldape,et al.  A multigene predictor of outcome in glioblastoma. , 2010, Neuro-oncology.

[7]  Jeff H. Chang,et al.  GENE-Counter: A Computational Pipeline for the Analysis of RNA-Seq Data for Gene Expression Differences , 2011, PloS one.

[8]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[9]  Tao Han,et al.  Cross-platform comparability of microarray technology: Intra-platform consistency and appropriate data analysis procedures are essential , 2005, BMC Bioinformatics.

[10]  J. Raser,et al.  Noise in Gene Expression: Origins, Consequences, and Control , 2005, Science.

[11]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[12]  I. Dozmorov,et al.  Internal standard-based analysis of microarray data. Part 1: analysis of differential gene expressions , 2009, Nucleic acids research.

[13]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[14]  Daniela M. Witten,et al.  Classification and clustering of sequencing data using a poisson model , 2011, 1202.6201.

[15]  Katrin Hoffmann,et al.  Gene expression levels assessed by oligonucleotide microarray analysis and quantitative real-time RT-PCR – how well do they correlate? , 2005, BMC Genomics.

[16]  Jano I van Hemert,et al.  Correcting for intra-experiment variation in Illumina BeadChip data is necessary to generate robust gene-expression profiles , 2010, BMC Genomics.

[17]  Fang Liu,et al.  A primer on the current state of microarray technologies. , 2012, Methods in molecular biology.

[18]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[19]  Martin Vingron,et al.  Normalization and quantification of differential expression in gene expression microarrays , 2006, Briefings Bioinform..

[20]  Peter Müller,et al.  On Differential Gene Expression Using RNA-Seq Data , 2011, Cancer informatics.

[21]  Tao Chen,et al.  Functional comparison of microarray data across multiple platforms using the method of percentage of overlapping functions. , 2012, Methods in molecular biology.

[22]  Jae K. Lee,et al.  Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays , 2003, Bioinform..

[23]  E. O’Shea,et al.  Living with noisy genes: how cells function reliably with inherent variability in gene expression. , 2007, Annual review of biophysics and biomolecular structure.

[24]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[25]  Z. Szallasi,et al.  Reliability and reproducibility issues in DNA microarray measurements. , 2006, Trends in genetics : TIG.

[26]  Jing Zhu,et al.  Apparently low reproducibility of true differential expression discoveries in microarray studies , 2008, Bioinform..

[27]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[28]  S. Dhanasekaran,et al.  Delineation of prognostic biomarkers in prostate cancer , 2001, Nature.

[29]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[30]  Hui Xiao,et al.  Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes , 2009, Bioinform..

[31]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[32]  Jae K Lee,et al.  Statistical identification of differentially labeled peptides from liquid chromatography tandem mass spectrometry , 2007, Proteomics.

[33]  Brian S. Yandell,et al.  Adaptive Gene Picking with Microarray Data: Detecting Important Low Abundance Signals , 2003 .

[34]  Anne-Laure Boulesteix,et al.  Stability and aggregation of ranked gene lists , 2009, Briefings Bioinform..

[35]  Richard Simon,et al.  A random variance model for detection of differential gene expression in small microarray experiments , 2003, Bioinform..

[36]  Michael L. Bittner,et al.  Characterization of the Effectiveness of Reporting Lists of Small Feature Sets Relative to the Accuracy of the Prior Biological Knowledge , 2010, Cancer informatics.

[37]  Vanessa M Kvam,et al.  A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. , 2012, American journal of botany.

[38]  A. Conesa,et al.  Differential expression in RNA-seq: a matter of depth. , 2011, Genome research.

[39]  Jing Cao,et al.  A close examination of double filtering with fold change and t test in microarray analysis , 2009, BMC Bioinformatics.

[40]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[41]  Wentian Li,et al.  The-more-the-better and the-less-the-better , 2006, Bioinform..

[42]  Jennifer L. Osborn,et al.  Direct multiplexed measurement of gene expression with color-coded probe pairs , 2008, Nature Biotechnology.

[43]  Chunyu Liu,et al.  Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods , 2011, PloS one.

[44]  Wentian Li,et al.  Three lectures on case-control genetic association analysis , 2007, Briefings Bioinform..

[45]  Saralees Nadarajah,et al.  Statistical methods on detecting differentially expressed genes for RNA-seq data , 2011, BMC Systems Biology.

[46]  X. Cui,et al.  Statistical tests for differential expression in cDNA microarray experiments , 2003, Genome Biology.

[47]  Sangdun Choi,et al.  Current issues for DNA microarrays: platform comparison, double linear amplification, and universal RNA reference. , 2004, Journal of biotechnology.

[48]  I. Yang,et al.  The limits of log-ratios , 2004 .

[49]  John D. Minna,et al.  Probe mapping across multiple microarray platforms , 2012, Briefings Bioinform..

[50]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[51]  R. Stoughton Applications of DNA microarrays in biology. , 2005, Annual review of biochemistry.

[52]  J. G. Liao,et al.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[53]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[54]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[55]  Robert Tibshirani,et al.  A comparison of fold-change and the t-statistic for microarray data analysis , 2007 .

[56]  Francesco Falciani,et al.  DNA Microarrays: a Powerful Genomic Tool for Biomedical and Clinical Research , 2007, Molecular medicine.

[57]  Hong Yan,et al.  Noise reduction in microarray gene expression data based on spectral analysis , 2012, Int. J. Mach. Learn. Cybern..

[58]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[59]  F. V. Van Dolah,et al.  Microarray validation: factors influencing correlation between oligonucleotide microarrays and real-time PCR , 2006, Biological Procedures Online.

[60]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[61]  G. Fechner Elemente der Psychophysik , 1998 .

[62]  Robert Tibshirani,et al.  SAM “Significance Analysis of Microarrays” Users guide and technical document , 2002 .

[63]  P. Broberg Statistical methods for ranking differentially expressed genes , 2003, Genome Biology.

[64]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[65]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[66]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[67]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[68]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[70]  Peter Sykacek,et al.  Biological assessment of robust noise models in microarray data analysis , 2011, Bioinform..

[71]  Neil D. Lawrence,et al.  puma: a Bioconductor package for propagating uncertainty in microarray analysis , 2009, BMC Bioinformatics.

[72]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[73]  Andrea Splendiani,et al.  A power law global error model for the identification of differentially expressed genes in microarray data , 2004, BMC Bioinformatics.

[74]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[75]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[76]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[77]  Sanjay Mehrotra,et al.  Validation and characterization of DNA microarray gene expression data distribution and associated moments , 2010, BMC Bioinformatics.

[78]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[79]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[80]  Matthew D. Young,et al.  From RNA-seq reads to differential expression results , 2010, Genome Biology.

[81]  J. Ioannidis Microarrays and molecular research: noise discovery? , 2005, The Lancet.

[82]  Patrik Rydén,et al.  Classification of microarrays; synergistic effects between normalization, gene selection and machine learning , 2011, BMC Bioinformatics.

[83]  Stephen C. Harris,et al.  Rat toxicogenomic study reveals analytical consistency across microarray platforms , 2006, Nature Biotechnology.

[84]  Atul J. Butte,et al.  Autoimmune Disease Classification by Inverse Association with SNP Alleles , 2009, PLoS genetics.

[85]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[86]  Peter N. Murakami,et al.  Assessing affymetrix GeneChip microarray quality , 2011, BMC Bioinformatics.

[87]  Jean Yee Hwa Yang,et al.  Gene expression Identifying differentially expressed genes from microarray experiments via statistic synthesis , 2005 .

[88]  Johann A. Gagnon-Bartsch,et al.  Using control genes to correct for unwanted variation in microarray data. , 2012, Biostatistics.

[89]  Tobias A. Knoch,et al.  GRIMP: a web- and grid-based tool for high-speed analysis of large-scale genome-wide association using imputed data , 2009, Bioinform..

[90]  Giovanni Parmigiani,et al.  A Bayesian Model for Cross-Study Differential Gene Expression , 2009, Journal of the American Statistical Association.

[91]  Kevin R Coombes,et al.  Run batch effects potentially compromise the usefulness of genomic signatures for ovarian cancer. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[92]  James J. Chen,et al.  Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data , 2007, BMC Bioinformatics.

[93]  S. Henderson,et al.  Predicting biomarkers for ovarian cancer using gene-expression microarrays , 2004, British Journal of Cancer.

[94]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[95]  Yudi Pawitan,et al.  False discovery rate, sensitivity and sample size for microarray studies , 2005, Bioinform..

[96]  Soonmyung Paik,et al.  Gene-expression-based prognostic assays for breast cancer , 2010, Nature Reviews Clinical Oncology.

[97]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[98]  Lei Guo,et al.  The MicroArray Quality Control (MAQC) Project and Cross-Platform Analysis of Microarray Data , 2011, Handbook of Statistical Bioinformatics.

[99]  T. Speed,et al.  Design issues for cDNA microarray experiments , 2002, Nature Reviews Genetics.

[100]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[101]  Timothy J. Yeatman,et al.  Predictive biomarkers: identification and verification. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[102]  R. Young,et al.  Biomedical Discovery with DNA Arrays , 2000, Cell.

[103]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[104]  R. Kitchen,et al.  Relative impact of key sources of systematic noise in Affymetrix and Illumina gene-expression microarray experiments , 2011, BMC Genomics.

[105]  P. Collins,et al.  Performance comparison of one-color and two-color platforms within the Microarray Quality Control (MAQC) project , 2006, Nature Biotechnology.

[106]  Adrian E. Raftery,et al.  Normal uniform mixture differential gene expression detection for cDNA microarrays , 2005, BMC Bioinformatics.

[107]  Xuegong Zhang,et al.  DEGseq: an R package for identifying differentially expressed genes from RNA-seq data , 2010, Bioinform..

[108]  M Schena,et al.  Microarrays: biotechnology's discovery platform for functional genomics. , 1998, Trends in biotechnology.

[109]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[110]  Weida Tong,et al.  QA/QC: challenges and pitfalls facing the microarray community and regulatory agencies , 2004, Expert review of molecular diagnostics.

[111]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[112]  C. Furlanello,et al.  Variability in GWAS analysis: the impact of genotype calling algorithm inconsistencies , 2010, The Pharmacogenomics Journal.

[113]  Y. Tu,et al.  Quantitative noise analysis for gene expression microarray experiments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[114]  R. Tibshirani,et al.  Normalization, testing, and false discovery rate estimation for RNA-sequencing data. , 2012, Biostatistics.

[115]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[116]  Russell D. Wolfinger,et al.  The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster , 2001, Nature Genetics.

[117]  R. O’Hara,et al.  Do not log‐transform count data , 2010 .

[118]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[119]  Daniel J. Park,et al.  A sequence-oriented comparison of gene expression measurements across different hybridization-based technologies , 2006, Nature Biotechnology.

[120]  G. Churchill Fundamentals of experimental design for cDNA microarrays , 2002, Nature Genetics.

[121]  Atul Butte,et al.  The use and analysis of microarray data , 2002, Nature Reviews Drug Discovery.

[122]  F. Speleman,et al.  Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes , 2002, Genome Biology.

[123]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[124]  Xihong Lin,et al.  The effect of correlation in false discovery rate estimation. , 2011, Biometrika.

[125]  João Ricardo Sato,et al.  Evaluating different methods of microarray data normalization , 2006, BMC Bioinformatics.

[126]  Bryan Frank,et al.  Independence and reproducibility across microarray platforms , 2005, Nature Methods.

[127]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[128]  Wiguins Etienne,et al.  Comparison of mRNA gene expression by RT-PCR and DNA microarray. , 2004, BioTechniques.

[129]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[130]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[131]  Jaakko Astola,et al.  Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations , 2009, BMC Bioinformatics.

[132]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[133]  G. Nolan,et al.  Computational solutions to large-scale data management and analysis , 2010, Nature Reviews Genetics.