Entropy, Information Theory, Information Geometry and Bayesian Inference in Data, Signal and Image Processing and Inverse Problems

The main content of this review article is first to review the main inference tools using Bayes rule, the maximum entropy principle (MEP), information theory, relative entropy and the Kullback–Leibler (KL) divergence, Fisher information and its corresponding geometries. For each of these tools, the precise context of their use is described. The second part of the paper is focused on the ways these tools have been used in data, signal and image processing and in the inverse problems, which arise in different physical sciences and engineering applications. A few examples of the applications are described: entropy in independent components analysis (ICA) and in blind source separation, Fisher information in data model selection, different maximum entropy-based methods in time series spectral estimation and in linear inverse problems and, finally, the Bayesian inference for general inverse problems. Some original materials concerning the approximate Bayesian computation (ABC) and, in particular, the variational Bayesian approximation (VBA) methods are also presented. VBA is used for proposing an alternative Bayesian computational tool to the classical Markov chain Monte Carlo (MCMC) methods. We will also see that VBA englobes joint maximum a posteriori (MAP), as well as the different expectation-maximization (EM) algorithms as particular cases.

[1]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[2]  Edwin T. Jaynes Prior Probabilities , 2010, Encyclopedia of Machine Learning.

[3]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[4]  Ali Mohammad-Djafari,et al.  Bayesian source separation: beyond PCA and ICA , 2006, ESANN.

[5]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[6]  Jan Eriksson,et al.  A note on solution of large sparse maximum entropy problems with linear equality constraints , 1980, Math. Program..

[7]  D. Farrier,et al.  Jaynes' principle and maximum entropy spectral estimation , 1984 .

[8]  A. Mohammad-Djafari A Bayesian approach to source separation , 2000, math-ph/0008025.

[9]  William J. Fitzgerald,et al.  Markov chain Monte Carlo methods with applications to signal processing , 2001, Signal Process..

[10]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[11]  Hyun-Chul Kim,et al.  Bayesian Gaussian Process Classification with the EM-EP Algorithm , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jean-Jacques Fuchs,et al.  Detection and estimation of superimposed signals , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Ali Mohammad-Djafari,et al.  Bayesian Blind Source Separation of Positive Non Stationary Sources , 2004 .

[14]  J. Cadzow Maximum Entropy Spectral Analysis , 2006 .

[15]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[16]  J. McClellan,et al.  Multidimensional MEM spectral estimation , 1982 .

[17]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[18]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[19]  A. Mohammad-Djafari,et al.  A Hierarchical Markov Random Field Model for Bayesian Blind Image Separation , 2008, 2008 Congress on Image and Signal Processing.

[20]  Brandon M. Turner,et al.  Journal of Mathematical Psychology a Tutorial on Approximate Bayesian Computation , 2022 .

[21]  Ali Mohammad-Djafari,et al.  Variational Bayesian Approximation for Linear Inverse Problems with a Hierarchical Prior Models , 2013, GSI.

[22]  Walter R. Gilks,et al.  Strategies for improving MCMC , 1995 .

[23]  A. Gelfand,et al.  Bayesian Model Choice: Asymptotics and Exact Calculations , 1994 .

[24]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[25]  Lawrence Carin,et al.  Tree-Structured Compressive Sensing With Variational Bayesian Analysis , 2010, IEEE Signal Processing Letters.

[26]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[27]  A. Mohammad-Djafari,et al.  BAYESIAN BLIND DECONVOLUTION OF IMAGES COMPARING JMAP , EM AND BVA WITH A STUDENT-T A PRIORI MODEL , 2014 .

[28]  M. Barret,et al.  7 - Nouvelle présentation de la méthode du maximum d'entropie , 1990 .

[29]  Matthew J. Beal,et al.  Variational Bayesian learning of directed graphical models with hidden variables , 2006 .

[30]  Ali Mohammad-Djafari,et al.  Joint NDT Image Restoration and Segmentation Using Gauss–Markov–Potts Prior Models and Variational Bayesian Computation , 2009, IEEE Transactions on Image Processing.

[31]  H. Akaike A new look at the statistical model identification , 1974 .

[32]  A. Mohammad-Djafari Approche variationnelle pour le calcul bay\'esien dans les probl\`emes inverses en imagerie , 2007, 0706.1914.

[33]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[34]  Ali Mohammad-Djafari,et al.  Please Scroll down for Article Journal of Modern Optics Bayesian Inversion for Optical Diffraction Tomography Bayesian Inversion for Optical Diffraction Tomography , 2022 .

[35]  Sven Erlander,et al.  Entropy in linear programs , 1981, Math. Program..

[36]  Nikolas P. Galatsanos,et al.  A variational approach for Bayesian blind image deconvolution , 2004, IEEE Transactions on Signal Processing.

[37]  李幼升,et al.  Ph , 1989 .

[38]  Sotirios Chatzis,et al.  Factor Analysis Latent Subspace Modeling and Robust Fuzzy Clustering Using $t$-Distributions , 2009, IEEE Transactions on Fuzzy Systems.

[39]  M.I. Miller,et al.  The role of likelihood and entropy in incomplete-data problems: Applications to estimating point-process intensities and toeplitz constrained covariances , 1987, Proceedings of the IEEE.

[40]  Pierre Comon,et al.  Handbook of Blind Source Separation: Independent Component Analysis and Applications , 2010 .

[41]  Charles L. Byrne,et al.  General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis , 1990, IEEE Trans. Inf. Theory.

[42]  H. Rue,et al.  Approximate Bayesian inference for hierarchical Gaussian Markov random field models , 2007 .

[43]  E. Jaynes On the rationale of maximum-entropy methods , 1982, Proceedings of the IEEE.

[44]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[45]  Shijie Cai,et al.  Bayesian blind separation of mixed text patterns , 2008, 2008 International Conference on Audio, Language and Image Processing.

[46]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[47]  R. Fisher,et al.  On the Mathematical Foundations of Theoretical Statistics , 1922 .

[48]  John Skilling,et al.  Image restoration by a powerful maximum entropy method , 1982, Comput. Graph. Image Process..

[49]  Frank Nielsen,et al.  Sided and Symmetrized Bregman Centroids , 2009, IEEE Transactions on Information Theory.

[50]  Tad J. Ulrych,et al.  Information theory measures with application to model identification , 1986, IEEE Trans. Acoust. Speech Signal Process..

[51]  Vikas Sindhwani,et al.  The Geometric Basis of Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[52]  Valen E Johnson On Numerical Aspects of Bayesian Model Selection in High and Ultrahigh-dimensional Settings. , 2013, Bayesian analysis.

[53]  A. Mohammad-Djafari,et al.  Information geometry and prior selection , 2003 .

[54]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[55]  G. L. Besnerais Méthode du maximum d'entropie sur la moyenne, critères de reconstruction d'image et synthèse d'ouverture en radio-astronomie , 1993 .

[56]  Sueli I. Rodrigues Costa,et al.  Fisher information distance: a geometrical reading? , 2012, Discret. Appl. Math..

[57]  Michael A. West,et al.  BAYESIAN MODEL ASSESSMENT IN FACTOR ANALYSIS , 2004 .

[58]  Matthias Durr,et al.  Smoothness Priors Analysis Of Time Series , 2016 .

[59]  A. Mohammad-Djafari Bayesian or Laplacien inference, entropy and information theory and information geometry in data and signal processing , 2015 .

[60]  Ali Mohammad-Djafari Inverse Problems in Vision and 3D Tomography , 2009 .

[61]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[62]  A. Mohammad-Djafari,et al.  Bayesian 3D X-ray computed tomography image reconstruction with a scaled Gaussian mixture prior model , 2015 .

[63]  Sylvia Richardson,et al.  Markov chain concepts related to sampling algorithms , 1995 .

[64]  J. Skilling,et al.  Maximum-entropy and Bayesian methods in inverse problems , 1985 .

[65]  Ali Mohammad-Djafari Model selection for inverse problems: Best choice of basis functions and model order selection , 2001 .

[66]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[67]  Gersende Fort,et al.  Combining Monte Carlo and Mean-Field-Like Methods for Inference in Hidden Markov Random Fields , 2007, IEEE Transactions on Image Processing.

[68]  A. Mohammad-Djafari,et al.  Reconstruction of piecewise homogeneous images from partial knowledge of their Fourier Transform , 2004 .

[69]  Ryoichi Shimizu,et al.  On Fisher’s Amount of Information for Location Family , 1975 .

[70]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[71]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[73]  F. Itakura,et al.  A statistical method for estimation of speech spectral density and formant frequencies , 1970 .

[74]  S. Godsill,et al.  Special issue on Monte Carlo methods for statistical signal processing , 2002 .

[75]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[76]  Mohammed Djafari 1 - Maximum d'entropie et problèmes inverses en imagerie , 1994 .

[77]  J. Shore,et al.  Which is the better entropy expression for speech processing: -S log S or log S? , 1984 .

[78]  B. Buck,et al.  Linear inversion by the method of maximum entropy , 1989 .

[79]  Ariel Caticha Maximum entropy, fluctuations and priors , 2001 .

[80]  Ariel Caticha,et al.  Updating Probabilities with Data and Moments , 2007, ArXiv.

[81]  G. Larry Bretthorst,et al.  Bayesian Model Selection: Examples Relevant to NMR , 1989 .

[82]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[83]  James O. Berger,et al.  Current Challenges in Bayesian Model Choice , 2007 .

[84]  Kevin H. Knuth A Bayesian approach to source separation , 1999 .

[85]  T. Rodet,et al.  A gradient-like variational Bayesian algorithm , 2011, 2011 IEEE Statistical Signal Processing Workshop (SSP).

[86]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[87]  Paul Fearnhead,et al.  Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic ABC , 2010, 1004.1112.

[88]  Jean-Francois Bercher Developpement de criteres de nature entropique pour la resolution de problemes inverses lineaires , 1995 .

[89]  R. Johnson,et al.  Properties of cross-entropy minimization , 1981, IEEE Trans. Inf. Theory.

[90]  R. Levine,et al.  An Algorithm for Finding the Distribution of Maximal Entropy , 1979 .

[91]  M. Schroeder Linear prediction, entropy and signal analysis , 1984, IEEE ASSP Magazine.

[92]  Stephen F. Gull,et al.  Developments in Maximum Entropy Data Analysis , 1989 .

[93]  John Skilling,et al.  Maximum entropy method in image processing , 1984 .

[94]  David J. Spiegelhalter,et al.  Introducing Markov chain Monte Carlo , 1995 .

[95]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[96]  Hirotugu Akaike,et al.  On entropy maximization principle , 1977 .

[97]  R. Wilkinson Approximate Bayesian computation (ABC) gives exact results under the assumption of model error , 2008, Statistical applications in genetics and molecular biology.

[98]  J.H. McClellan,et al.  Multidimensional spectral estimation , 1982, Proceedings of the IEEE.

[99]  Kevin H. Knuth,et al.  Bayesian source separation and localization , 1998, Optics & Photonics.

[100]  Shun-ichi Amari,et al.  Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient , 1996, NIPS.

[101]  Audrey Giremus,et al.  Bayesian noise model selection and system identification based on approximation of the evidence , 2014, 2014 IEEE Workshop on Statistical Signal Processing (SSP).

[102]  T. Elfving On some methods for entropy maximization and matrix scaling , 1980 .

[103]  R. Preuss,et al.  Maximum entropy and Bayesian data analysis: Entropic prior distributions. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[104]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[105]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[106]  Ali Mohammad-Djafari,et al.  Variational Bayes and Mean Field Approximations for Markov field unsupervised estimation , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[107]  M. Tanner Tools for statistical inference: methods for the exploration of posterior distributions and likeliho , 1994 .

[108]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[109]  A. Barron,et al.  Fisher information inequalities and the central limit theorem , 2001, math/0111020.

[110]  H. Akaike Power spectrum estimation through autoregressive model fitting , 1969 .

[111]  J. Borwein,et al.  Duality relationships for entropy-like minimization problems , 1991 .

[112]  Jorma Rissanen,et al.  Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.

[113]  Nial Friel,et al.  Block clustering with collapsed latent block models , 2010, Statistics and Computing.

[115]  Jonathan M. Borwein,et al.  Convergence of Best Entropy Estimates , 1991, SIAM J. Optim..

[116]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[117]  G. Demoment,et al.  Tomographie de diffraction et synthse de Fourier maximum d'entropie , 1987 .