Structure or Noise?

Author(s): Still, Susanne; Crutchfield, James P | Abstract: We show how rate-distortion theory provides a mechanism for automated theory building by naturally distinguishing between regularity and randomness. We start from the simple principle that model variables should, as much as possible, render the future and past conditionally independent. From this, we construct an objective function for model making whose extrema embody the trade-off between a model's structural complexity and its predictive power. The solutions correspond to a hierarchy of models that, at each level of complexity, achieve optimal predictive power at minimal cost. In the limit of maximal prediction the resulting optimal model identifies a process's intrinsic organization by extracting the underlying causal states. In this limit, the model's complexity is given by the statistical complexity, which is known to be minimal for achieving maximum prediction. Examples show how theory building can profit from analyzing a process's causal compressibility, which is reflected in the optimal models' rate-distortion curve--the process's characteristic for optimally balancing structure and noise at different levels of representation.

[1]  Mark Przybocki,et al.  NIST 2005 machine translation evaluation official results , 2005 .

[2]  Jean-Raymond Abrial,et al.  On B , 1998, B.

[3]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[4]  A. Steele Predictability , 1997, The British journal of ophthalmology.

[5]  J. Crutchfield,et al.  Regularities unseen, randomness observed: levels of entropy convergence. , 2001, Chaos.

[6]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[7]  J. Crutchfield,et al.  Thermodynamic depth of causal states: Objective complexity via minimal representations , 1999 .

[8]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[9]  Philotheus Boehner,et al.  Philosophical writings : a selection , 1964 .

[10]  Gal Chechik,et al.  Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[13]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[14]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[15]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[16]  J. Crutchfield The calculi of emergence: computation, dynamics and induction , 1994 .

[17]  Ericka Stricklin-Parker,et al.  Ann , 2005 .

[18]  James P. Crutchfield,et al.  Equations of Motion from a Data Series , 1987, Complex Syst..

[19]  James P. Crutchfield,et al.  Optimal Causal Inference , 2007, ArXiv.

[20]  J. Rogers Chaos , 1876 .

[21]  Hirotugu Akaike An objective use of Bayesian models , 1977 .

[22]  Suguru Arimoto,et al.  An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[23]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  J. A. Stewart,et al.  Nonlinear Time Series Analysis , 2015 .

[25]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[26]  L. Goddard Information Theory , 1962, Nature.

[27]  Mike Mannion,et al.  Complex systems , 1997, Proceedings International Conference and Workshop on Engineering of Computer-Based Systems.

[28]  Young,et al.  Inferring statistical complexity. , 1989, Physical review letters.

[29]  A. U.S.,et al.  Predictability , Complexity , and Learning , 2002 .

[30]  Naftali Tishby,et al.  Predictability, Complexity, and Learning , 2000, Neural Computation.

[31]  William Bialek,et al.  How Many Clusters? An Information-Theoretic Perspective , 2003, Neural Computation.

[32]  J. Sprott Chaos and time-series analysis , 2001 .

[33]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[34]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[35]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.