Product formalisms for measures on spaces with binary tree structures: representation, visualization, and multiscale noise

Abstract In this paper, we present a theoretical foundation for a representation of a data set as a measure in a very large hierarchically parametrized family of positive measures, whose parameters can be computed explicitly (rather than estimated by optimization), and illustrate its applicability to a wide range of data types. The preprocessing step then consists of representing data sets as simple measures. The theoretical foundation consists of a dyadic product formula representation lemma, and a visualization theorem. We also define an additive multiscale noise model that can be used to sample from dyadic measures and a more general multiplicative multiscale noise model that can be used to perturb continuous functions, Borel measures, and dyadic measures. The first two results are based on theorems in [15, 3, 1]. The representation uses the very simple concept of a dyadic tree and hence is widely applicable, easily understood, and easily computed. Since the data sample is represented as a measure, subsequent analysis can exploit statistical and measure theoretic concepts and theories. Because the representation uses the very simple concept of a dyadic tree defined on the universe of a data set, and the parameters are simply and explicitly computable and easily interpretable and visualizable, we hope that this approach will be broadly useful to mathematicians, statisticians, and computer scientists who are intrigued by or involved in data science, including its mathematical foundations.

[1]  Peter W. Jones Factorization of A p Weights , 1980 .

[2]  P. Koskela Removable sets for Sobolev spaces , 1999 .

[3]  Peter W. Jones Factorization of AP weights , 1980 .

[4]  Vladimir Rokhlin,et al.  Randomized approximate nearest neighbors algorithm , 2011, Proceedings of the National Academy of Sciences.

[5]  M. Maggioni Geometry of Data and Biology , 2015 .

[6]  Christopher J. Bishop,et al.  Hausdorff dimension and Kleinian groups , 1994 .

[7]  Jean-Paul Berroir,et al.  Multifractal Segmentation of Medical Images , 1994 .

[8]  G. W. Stewart,et al.  Matrix Algorithms: Volume 1, Basic Decompositions , 1998 .

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  Joan Bruna,et al.  Learning Stable Group Invariant Representations with Convolutional Networks , 2013, ICLR.

[11]  Vincent Vargas,et al.  Gaussian multiplicative chaos and applications: A review , 2013, 1305.6221.

[12]  Douglas Comer,et al.  Internetworking with TCP/IP , 1988 .

[13]  Linda Ness Inference of a Dyadic Measure and Its Simplicial Geometry from Binary Feature Data and Application to Data Quality , 2017, Association for Women in Mathematics Series.

[14]  Peter W. Jones,et al.  Wiggly sets and limit sets , 1997 .

[15]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[16]  R. Bowen Hausdorff dimension of quasi-circles , 1979 .

[17]  B. Mandelbrot Possible refinement of the lognormal hypothesis concerning the distribution of energy dissipation in intermittent turbulence , 1972 .

[18]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[19]  Linda Ness,et al.  Heuristic Framework for Multi-Scale Testing of the Multi-Manifold Hypothesis , 2018, Association for Women in Mathematics Series.

[20]  Daniel Kunin,et al.  Loss Landscapes of Regularized Linear Autoencoders , 2019, ICML.

[21]  Murad S. Taqqu,et al.  On the Self-Similar Nature of Ethernet Traffic , 1993, SIGCOMM.

[22]  J. Wilson,et al.  Some weighted norm inequalities concerning the schrödinger operators , 1985 .

[23]  Alexander Cloninger,et al.  Provable approximation properties for deep neural networks , 2015, ArXiv.

[24]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[25]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[26]  Peter J. Holden Extension theorems for functions of vanishing mean oscillation , 1990 .

[27]  Dimitri Lague,et al.  3D Terrestrial LiDAR data classification of complex natural scenes using a multi-scale dimensionality criterion: applications in geomorphology , 2011, ArXiv.

[28]  Ann B. Lee,et al.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[29]  W. F. Osgood A Jordan curve of positive area , .

[30]  P. Jones On removable sets for Sobolev spaces in the plane , 1991, math/9201298.

[31]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[32]  E. Saksman,et al.  Random conformal weldings , 2009, 0909.1003.

[33]  Per-Gunnar Martinsson,et al.  Randomized algorithms for the low-rank approximation of matrices , 2007, Proceedings of the National Academy of Sciences.

[34]  Karen A. Scarfone,et al.  Security of Interactive and Automated Access Management Using Secure Shell (SSH) , 2015 .

[35]  K. Okikiolu Characterization of Subsets of Rectifiable Curves in Rn , 1992 .

[36]  M. Maggioni,et al.  Manifold parametrizations by eigenfunctions of the Laplacian and heat kernels , 2008, Proceedings of the National Academy of Sciences.

[37]  David Shallcross,et al.  Centralized multi-scale singular value decomposition for feature construction in LIDAR image classification problems , 2012, 2012 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[38]  E. Sharon,et al.  2D-Shape Analysis Using Conformal Mapping , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[39]  J. Kahane Sur le chaos multiplicatif , 1985 .

[40]  R. Fefferman,et al.  The theory of weights and the Dirichlet problem for elliptic equations , 1991 .

[41]  David Mumford,et al.  2D-Shape Analysis Using Conformal Mapping , 2004, CVPR.

[42]  Peter W. Jones Quasiconformal mappings and extendability of functions in sobolev spaces , 1981 .

[43]  J. Kahane,et al.  Sur certaines martingales de Benoit Mandelbrot , 1976 .

[44]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[45]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Amela Zekovic,et al.  Multifractal analysis of 3D video representation formats , 2014, EURASIP J. Wirel. Commun. Netw..

[47]  Peter W. Jones,et al.  A multiscale guide to Brownian motion , 2015, 1505.04525.

[48]  G. Stewart Matrix Algorithms, Volume II: Eigensystems , 2001 .

[49]  B. Mandelbrot,et al.  Multifractal products of cylindrical pulses , 2002 .

[50]  Spectral theory, Hausdorff dimension and the topology of hyperbolic 3-manifolds , 1998, math/9810124.

[51]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[52]  J. Campbell Introduction to remote sensing , 1987 .

[53]  Stéphane Mallat,et al.  Group Invariant Scattering , 2011, ArXiv.

[54]  R. Nowak,et al.  Multiscale likelihood analysis and complexity penalized estimation , 2004, math/0406424.

[55]  Linda Ness Dyadic Product Formula Representations of Confidence Measures and Decision Rules for Dyadic Data Set Samples , 2016, MISNC, SI, DS 2016.

[56]  L. Ahlfors,et al.  The boundary correspondence under quasiconformal mappings , 1956 .

[57]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[58]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[59]  Quansheng Liu Sur certaines martingales de Mandelbrot , 1999 .

[60]  Kôtaro Oikawa Welding of polygons and the type of Riemann surfaces , 1961 .