Log-Ratio Analysis of Finite Precision Data: Caveats, and Connections to Digital Lines and Number Theory

Log-ratio analysis (LRA) is a popular and theoretically coherent framework for investigating and modelling compositional data. Most empirical compositional data will be measured and recorded with finite precision; count data is a special instance of this in which the fundamental quantity of interest is discrete, but it is also common and practical to round continuous variables to the nearest convenient multiple of the unit of measurement. LRA is often applied to such finite precision measurements without considering the underlying discrete nature of the data (with the exception of the special case of zero values). Here we examine how the characteristics of finite precision data can manifest in LRA so that theoreticians and practitioners can be mindful of situations in which finite precision might affect their conclusions. We focus in particular on log-ratio variance—a fundamental measure of pairwise association between components—and demonstrate situations in which finite precision can have a profound effect on this statistic and related measures of proportionality. We also make connections to computer science concepts about digital lines and to mathematical concepts in number theory, including Farey sequences, to understand how finite precision approximations can affect the value of log-ratio variance even when the underlying continuous variables are perfectly proportional.

[1]  J. Aitchison,et al.  The multivariate Poisson-log normal distribution , 1989 .

[2]  Cédric Notredame,et al.  How should we measure proportionality on relative gene expression data? , 2016, Theory in Biosciences.

[3]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[4]  Gregory B. Gloor,et al.  Compositional uncertainty should not be ignored in high-throughput sequencing data analysis , 2016 .

[5]  Hongzhe Li,et al.  VARIABLE SELECTION FOR SPARSE DIRICHLET-MULTINOMIAL REGRESSION WITH AN APPLICATION TO MICROBIOME DATA ANALYSIS. , 2013, The annals of applied statistics.

[6]  AZRIEL ROSENFELD,et al.  Digital Straight Line Segments , 1974, IEEE Transactions on Computers.

[7]  Michael Greenacre,et al.  Power Transformations in Correspondence Analysis , 2007, Comput. Stat. Data Anal..

[8]  Jürg Bähler,et al.  Proportionality: A Valid Alternative to Correlation for Relative Data , 2014, bioRxiv.

[9]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[10]  David R. Lovell,et al.  Counts: an outstanding challenge for log-ratio analysis of compositional data in the molecular biosciences , 2020, NAR genomics and bioinformatics.

[11]  K. Pearson Mathematical contributions to the theory of evolution.—On a form of spurious correlation which may arise when indices are used in the measurement of organs , 1897, Proceedings of the Royal Society of London.

[12]  P. Müller,et al.  Bayesian Forecasting of Multinomial Time Series through Conditionally Gaussian Dynamic Models , 1997 .

[13]  Compositional analysis of overdispersed counts using generalized estimating equations , 2011, Environmental and Ecological Statistics.

[14]  Ronald L. Graham,et al.  Concrete mathematics - a foundation for computer science , 1991 .

[15]  Tak-ching Leung,et al.  Bayesian censoring approach to rounded zeros in compositional data , 2017 .

[16]  Jack Bresenham,et al.  Algorithm for computer control of a digital plotter , 1965, IBM Syst. J..

[17]  Lawrence A. David,et al.  Dynamic linear models guide design and analysis of microbiota studies within artificial human guts , 2018, Microbiome.

[18]  P. Guttorp,et al.  Statistical Interpretation of Species Composition , 2001 .

[19]  Herbert Freeman,et al.  On the Encoding of Arbitrary Geometric Configurations , 1961, IRE Trans. Electron. Comput..

[20]  Edward M. Reingold,et al.  Line drawing, leap years, and Euclid , 2004, CSUR.

[21]  Jonathan Friedman,et al.  Inferring Correlation Networks from Genomic Survey Data , 2012, PLoS Comput. Biol..

[22]  J C Lindsey,et al.  Tutorial in biostatistics methods for interval-censored data. , 1998, Statistics in medicine.

[23]  C. Quince,et al.  Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics , 2012, PloS one.