Toward Bayesian Data Compression

In order to handle large datasets omnipresent in modern science, efficient compression algorithms are necessary. Here, a Bayesian data compression (BDC) algorithm that adapts to the specific measurement situation is derived in the context of signal reconstruction. BDC compresses a dataset under conservation of its posterior structure with minimal information loss given the prior knowledge on the signal, the quantity of interest. Its basic form is valid for Gaussian priors and likelihoods. For constant noise standard deviation, basic BDC becomes equivalent to a Bayesian analog of principal component analysis. Using metric Gaussian variational inference, BDC generalizes to non‐linear settings. In its current form, BDC requires the storage of effective instrument response functions for the compressed data and corresponding noise encoding the posterior covariance structure. Their memory demand counteract the compression gain. In order to improve this, sparsity of the compressed responses can be obtained by separating the data into patches and compressing them separately. The applicability of BDC is demonstrated by applying it to synthetic data and radio astronomical data. Still the algorithm needs further improvement as the computation time of the compression and subsequent inference exceeds the time of the inference with the original data.

[1]  Rüdiger Westermann,et al.  Comparison of classical and Bayesian imaging in radio interferometry , 2020, Astronomy & Astrophysics.

[2]  Zhifeng Liu,et al.  A systematic review of big data-based urban sustainability research: State-of-the-science and future directions , 2020 .

[3]  Philipp Frank,et al.  Unified radio interferometric calibration and imaging with joint uncertainty quantification , 2019, Astronomy & Astrophysics.

[4]  Torsten A. Enßlin,et al.  Metric Gaussian Variational Inference , 2019, ArXiv.

[5]  LIII , 2018, Out of the Shadow.

[6]  Torsten A. Enßlin,et al.  Encoding prior knowledge in the structure of the likelihood , 2018, ArXiv.

[7]  Jason D. McEwen,et al.  Online radio interferometric imaging: assimilating and discarding visibilities on arrival , 2017, Monthly Notices of the Royal Astronomical Society.

[8]  Omar M. Knio,et al.  Optimal projection of observations in a Bayesian setting , 2017, Comput. Stat. Data Anal..

[9]  S. Natarajan,et al.  Square Kilometre Array: The radio telescope of the XXI century , 2017, Astronomy Reports.

[10]  Torsten A. Enßlin,et al.  Optimal Belief Approximation , 2016, Entropy.

[11]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[12]  T. Ensslin,et al.  RESOLVE: A new algorithm for aperture synthesis imaging of extended emission in radio astronomy , 2013, 1311.5282.

[13]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[14]  Gernot Kubin,et al.  Signal Enhancement as Minimization of Relevant Information Loss , 2012, ArXiv.

[15]  Massimo Vergassola,et al.  ‘Infotaxis’ as a strategy for searching without gradients , 2007, Nature.

[16]  I. Jolliffe Principal Component Analysis , 2005 .

[17]  Norbert Wiener,et al.  Extrapolation, Interpolation, and Smoothing of Stationary Time Series , 1964 .

[18]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[19]  D. D. Kosambi Statistics in Function Space , 2016 .

[20]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[21]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[22]  S. Ananthakrishnan The Giant Meterwave Radio Telescope / GMRT , 1995 .

[23]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[24]  Kari Karhunen,et al.  Über lineare Methoden in der Wahrscheinlichkeitsrechnung , 1947 .