Proportionality: A Valid Alternative to Correlation for Relative Data

In the life sciences, many measurement methods yield only the relative abundances of different components in a sample. With such relative—or compositional—data, differential expression needs careful interpretation, and correlation—a statistical workhorse for analyzing pairwise relationships—is an inappropriate measure of association. Using yeast gene expression data we show how correlation can be misleading and present proportionality as a valid alternative for relative data. We show how the strength of proportionality between two variables can be meaningfully and interpretably described by a new statistic ϕ which can be used instead of correlation as the basis of familiar analyses and visualisation methods, including co-expression networks and clustered heatmaps. While the main aim of this study is to present proportionality as a means to analyse relative data, it also raises intriguing questions about the molecular mechanisms underlying the proportional regulation of a range of yeast genes.

[1]  David A. Orlando,et al.  Revisiting Global Gene Expression Analysis , 2012, Cell.

[2]  K. Pearson Mathematical contributions to the theory of evolution.—On a form of spurious correlation which may arise when indices are used in the measurement of organs , 1897, Proceedings of the Royal Society of London.

[3]  Curtis Huttenhower,et al.  Microbial Co-occurrence Relationships in the Human Microbiome , 2012, PLoS Comput. Biol..

[4]  R. Olea,et al.  Dealing with Zeros , 2011 .

[5]  A. Brazma,et al.  Global transcriptional responses of fission yeast to environmental stress. , 2003, Molecular biology of the cell.

[6]  Ilya Shmulevich,et al.  Simcluster: clustering enumeration gene expression data on the simplex space , 2007, BMC Bioinformatics.

[7]  J. Warner,et al.  The economics of ribosome biosynthesis in yeast. , 1999, Trends in biochemical sciences.

[8]  Kengo Kinoshita,et al.  COXPRESdb: a database to compare gene coexpression in seven model animals , 2010, Nucleic Acids Res..

[9]  Jonathan Friedman,et al.  Inferring Correlation Networks from Genomic Survey Data , 2012, PLoS Comput. Biol..

[10]  M. Schlumpberger,et al.  Ribosomal RNA depletion for efficient use of RNA-seq capacity. , 2013, Current protocols in molecular biology.

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  L. López-Kleine,et al.  Biostatistical approaches for the reconstruction of gene co-expression networks based on transcriptomic data. , 2013, Briefings in functional genomics.

[13]  J. Bähler,et al.  Tuning gene expression to changing environments: from rapid responses to evolutionary adaptation , 2008, Nature Reviews Genetics.

[14]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[15]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[16]  P. Kemmeren,et al.  Monitoring global messenger RNA changes in externally controlled microarray experiments , 2003, EMBO reports.

[17]  Friedrich Leisch,et al.  Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[18]  B. Zheng Summarizing the goodness of fit of generalized linear models for longitudinal data. , 2000, Statistics in medicine.

[19]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[20]  M. Westoby,et al.  Bivariate line‐fitting methods for allometry , 2006, Biological reviews of the Cambridge Philosophical Society.

[21]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[22]  Vera Pawlowsky-Glahn,et al.  Basic Concepts and Procedures , 2011 .

[23]  M. Greenacre Compositional Data and Correspondence Analysis , 2011 .

[24]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[25]  R. Aebersold,et al.  Quantitative Analysis of Fission Yeast Transcriptomes and Proteomes in Proliferating and Quiescent Cells , 2012, Cell.

[26]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[27]  David R. Lovell,et al.  Tools for compositional data with a total , 2015 .

[28]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[29]  Emanuel Parzen,et al.  LP Approach to Statistical Modeling , 2014, 1405.2601.

[30]  David R. Lovell,et al.  Proportionality: a valid alternative to correlation for relative data , 2014, bioRxiv.

[31]  Darrel C. Ince,et al.  The case for open computer programs , 2012, Nature.

[32]  M. Hooten,et al.  On the use of log-transformation vs. nonlinear regression for analyzing biological power laws. , 2011, Ecology.

[33]  J. Bacon-Shone Discrete and Continuous Compositions , 2008 .

[34]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[35]  T. Lumley,et al.  gplots: Various R Programming Tools for Plotting Data , 2015 .