Estimation and Inference by Compact Coding

SUMMARY The systematic variation within a set of data, as represented by a usual statistical model, may be used to encode the data in a more compact form than would be possible if they were considered to be purely random. The encoded form has two parts. The first states the inferred estimates of the unknown parameters in the model, the second states the data using an optimal code based on the data probability distribution implied by those parameter estimates. Choosing the model and the estimates that give the most compact coding leads to an interesting general inference procedure. In its strict form it has great generality and several nice properties but is computationally infeasible. An approximate form is developed and its relation to other methods is explored.

[1]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[3]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[4]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[5]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[6]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[7]  Chris S. Wallace,et al.  A Program for Numerical Classification , 1970, Comput. J..

[8]  C. S. Wallace,et al.  An Information Measure for Hierarchic Classification , 1973, Comput. J..

[9]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[10]  H. Akaike A new look at the statistical model identification , 1974 .

[11]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[12]  Rissanen Parameter estimation by shortest description of data , 1976 .

[13]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[14]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[15]  Hieotugu Akaike Comments on ‘ On model structure testing in system identification’ , 1978 .

[16]  Jan M. Maciejowski,et al.  Model discrimination using an algorithmic information criterion , 1979, Autom..

[17]  R. Shibata Asymptotically Efficient Selection of the Order of the Model for Estimating Parameters of a Linear Process , 1980 .

[18]  A. Atkinson A note on the generalized information criterion for choice of a model , 1980 .

[19]  A. Atkinson Likelihood ratios, posterior odds and information criteria , 1981 .

[20]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[21]  C. S. Wallace,et al.  Archaeoastronomy in the Old World: STONE CIRCLE GEOMETRIES: AN INFORMATION THEORY APPROACH , 1982 .

[22]  N. J. A. Sloane,et al.  Voronoi regions of lattices, second moments of polytopes, and quantization , 1982, IEEE Trans. Inf. Theory.

[23]  E. Hannan,et al.  Recursive estimation of mixed autoregressive-moving average order , 1982 .

[24]  J. Rissanen Estimation of structure by minimum description length , 1982 .

[25]  Paul L. Zador,et al.  Asymptotic quantization error of continuous signals and the quantization dimension , 1982, IEEE Trans. Inf. Theory.

[26]  JORMA RISSANEN,et al.  A universal data compression system , 1983, IEEE Trans. Inf. Theory.

[27]  J. Kent Information gain and a general measure of correlation , 1983 .

[28]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[29]  E. J. Hannan,et al.  A method for autoregressive-moving average estimation , 1984 .

[30]  K. Mardia,et al.  Maximum likelihood estimation of models for residual covariance in spatial regression , 1984 .

[31]  C. S. Wallace,et al.  A General Selection Criterion for Inductive Inference , 1984, ECAI.