Towards a Mathematical Theory of Abstraction

While the utility of well-chosen abstractions for understanding and predicting the behaviour of complex systems is well appreciated, precisely what an abstraction is has so far has largely eluded mathematical formalization. In this paper, we aim to set out a mathematical theory of abstraction. We provide a precise characterisation of what an abstraction is and, perhaps more importantly, suggest how abstractions can be learnt directly from data both for static datasets and for dynamical systems. We define an abstraction to be a small set of ‘summaries’ of a system which can be used to answer a set of queries about the system or its behaviour. The difference between the ground truth behaviour of the system on the queries and the behaviour of the system predicted only by the abstraction provides a measure of the ‘leakiness’ of the abstraction which can be used as a loss function to directly learn abstractions from data. Our approach can be considered a generalization of classical statistics where we are not interested in reconstructing ‘the data’ in full, but are instead only concerned with answering a set of arbitrary queries about the data. While highly theoretical, our results have deep implications for statistical inference and machine learning and could be used to develop explicit methods for learning precise kinds of abstractions directly from data.

[1]  E. T. Jaynes,et al.  The Relation of Bayesian and Maximum Entropy Methods , 1988 .

[2]  E. Pitman,et al.  Sufficient statistics and intrinsic accuracy , 1936, Mathematical Proceedings of the Cambridge Philosophical Society.

[3]  石黒 真木夫,et al.  Akaike information criterion statistics , 1986 .

[4]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[5]  J. Troutman Variational Principles in Mechanics , 1983 .

[6]  Kingshuk Ghosh,et al.  Perspective: Maximum caliber is a general variational principle for dynamical systems. , 2017, The Journal of chemical physics.

[7]  Edward I. George,et al.  The Practical Implementation of Bayesian Model Selection , 2001 .

[8]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[9]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[10]  Griewank,et al.  On automatic differentiation , 1988 .

[11]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[12]  E. T. Jaynesz,et al.  Clearing up Mysteries { the Original Goal , 1989 .

[13]  Stergios B. Fotopoulos,et al.  Introduction to Modern Nonparametric Statistics , 2004, Technometrics.

[14]  Variational Methods in Mathematical Physics: A Unified Approach , 1992 .

[15]  J. Hanc,et al.  Symmetries and conservation laws: Consequences of Noether’s theorem , 2004 .

[16]  Steven A. Frank,et al.  Common Probability Patterns Arise from Simple Invariances , 2016, Entropy.

[17]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[20]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[21]  E. T. Jaynes,et al.  Macroscopic Prediction , 1996 .

[22]  Marcus Hutter,et al.  Algorithmic Information Theory , 1977, IBM J. Res. Dev..

[23]  Adam Binch,et al.  Perception as Bayesian Inference , 2014 .

[24]  Karl J. Friston,et al.  Bayesian model selection for group studies , 2009, NeuroImage.

[25]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[26]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[27]  Richard W. Hamming,et al.  Coding and Information Theory , 2018, Feynman Lectures on Computation.

[28]  B. O. Koopman On distributions admitting a sufficient statistic , 1936 .

[29]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[30]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[31]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .