Multiscale sparse microcanonical models

We study density estimation of stationary processes defined over an infinite grid from a single, finite realization. Gaussian Processes and Markov Random Fields avoid the curse of dimensionality by focusing on low-order and localized potentials respectively, but its application to complex datasets is limited by their inability to capture singularities and long-range interactions, and their expensive inference and learning respectively. These are instances of Gibbs models, defined as maximum entropy distributions under moment constraints determined by an energy vector. The Boltzmann equivalence principle states that under appropriate ergodicity, such \emph{macrocanonical} models are approximated by their \emph{microcanonical} counterparts, which replace the expectation by the sample average. Microcanonical models are appealing since they avoid computing expensive Lagrange multipliers to meet the constraints. This paper introduces microcanonical measures whose energy vector is given by a wavelet scattering transform, built by cascading wavelet decompositions and point-wise nonlinearities. We study asymptotic properties of generic microcanonical measures, which reveal the fundamental role of the differential structure of the energy vector in controlling e.g. the entropy rate. Gradient information is also used to define a microcanonical sampling algorithm, for which we provide convergence analysis to the microcanonical measure. Whereas wavelet transforms capture local regularity at different scales, scattering transforms provide scale interaction information, critical to restore the geometry of many physical phenomena. We demonstrate the efficiency of sparse multiscale microcanonical measures on several processes and real data exhibiting long-range interactions, such as Ising, Cox Processes and image and audio textures.

[1]  Laurent Massoulié,et al.  Power spectra of random spike fields & related processes , 2003 .

[2]  Robert E. Mahony,et al.  Convergence of the Iterates of Descent Methods for Analytic Cost Functions , 2005, SIAM J. Optim..

[3]  Adrian S. Lewis,et al.  A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization , 2005, SIAM J. Optim..

[4]  Aris Daniilidis,et al.  Sard theorems for Lipschitz functions and applications in optimization , 2016 .

[5]  P. Kopietz,et al.  Mean-Field Theory and the Gaussian Approximation , 2010 .

[6]  Eero P. Simoncelli,et al.  A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.

[7]  E. Olivieri,et al.  Large deviations and metastability: Large deviations and statistical mechanics , 2005 .

[8]  Stéphane Mallat,et al.  Audio Texture Synthesis with Scattering Moments , 2013, ArXiv.

[9]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[11]  S. Varadhan,et al.  Large deviations for stationary Gaussian processes , 1985 .

[12]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[13]  Bruno Galerne,et al.  Random Phase Textures: Theory and Synthesis , 2011, IEEE Transactions on Image Processing.

[14]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[15]  O. Lanford Time evolution of large classical systems , 1975 .

[16]  Y. Meyer Wavelets and Operators , 1993 .

[17]  Grant M. Rotskoff,et al.  Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.

[18]  Stéphane Mallat,et al.  Phase Harmonics and Correlation Invariants in Convolutional Neural Networks , 2018, ArXiv.

[19]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[20]  Hans-Otto Georgii,et al.  Gibbs Measures and Phase Transitions , 1988 .

[21]  Georgios Piliouras,et al.  Gradient Descent Converges to Minimizers: The Case of Non-Isolated Critical Points , 2016, ArXiv.

[22]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.

[23]  Sixin Zhang,et al.  Phase harmonic correlations and convolutional neural networks , 2018, Information and Inference: A Journal of the IMA.

[24]  Amir Dembo,et al.  Large Deviations Techniques and Applications , 1998 .

[25]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[26]  Sourav Chatterjee,et al.  A note about the uniform distribution on the intersection of a simplex and a sphere , 2010, 1011.4043.

[27]  Émile Borel,et al.  Sur les principes de la théorie cinétique des gaz , 1906 .

[28]  D. Freedman,et al.  A dozen de Finetti-style results in search of a theory , 1987 .

[29]  Joakim Andén,et al.  Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.

[30]  Isabelle Gallagher,et al.  From Newton to Boltzmann: Hard Spheres and Short-range Potentials , 2012, 1208.5753.

[31]  Eero P. Simoncelli,et al.  Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .

[32]  Michael Betancourt,et al.  A Conceptual Introduction to Hamiltonian Monte Carlo , 2017, 1701.02434.

[33]  P. Kopietz,et al.  Introduction to the Functional Renormalization Group , 2010 .

[34]  Michael Creutz,et al.  Microcanonical Monte Carlo Simulation , 1983 .

[35]  Aldo Tagliani,et al.  Hamburger moment problem and Maximum Entropy: On the existence conditions , 2014, Appl. Math. Comput..

[36]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[37]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[38]  S. Mallat,et al.  Intermittent process analysis with scattering moments , 2013, 1311.4104.

[39]  G Battle,et al.  Wavelets and Renormalization , 1999 .

[40]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[41]  Francis Bach,et al.  On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.

[42]  Ofer Zeitouni,et al.  Microcanonical Distributions, Gibbs States, and the Equivalence of Ensembles , 1991 .

[43]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[44]  S. Mendelson,et al.  A probabilistic approach to the geometry of the ℓᵨⁿ-ball , 2005, math/0503650.

[45]  Daniel W. Stroock,et al.  Microcanonical distributions for lattice gases , 1991 .

[46]  V. Climenhaga Markov chains and mixing times , 2013 .

[47]  L. Onsager Crystal statistics. I. A two-dimensional model with an order-disorder transition , 1944 .