Minimum Message Length and Statistically Consistent Invariant (Objective?) Bayesian Probabilistic Inference—From (Medical) “Evidence”

“Evidence” in the form of data collected and analysis thereof is fundamental to medicine, health and science. In this paper, we discuss the “evidence‐based” aspect of evidence‐based medicine in terms of statistical inference, acknowledging that this latter field of statistical inference often also goes by various near‐synonymous names—such as inductive inference (amongst philosophers), econometrics (amongst economists), machine learning (amongst computer scientists) and, in more recent times, data mining (in some circles). Three central issues to this discussion of “evidence‐based” are (i) whether or not the statistical analysis can and/or should be objective and/or whether or not (subjective) prior knowledge can and/or should be incorporated, (ii) whether or not the analysis should be invariant to the framing of the problem (e.g. does it matter whether we analyse the ratio of proportions of morbidity to non‐morbidity rather than simply the proportion of morbidity?), and (iii) whether or not, as we get more and more data, our analysis should be able to converge arbitrarily closely to the process which is generating our observed data. For many problems of data analysis, it would appear that desiderata (ii) and (iii) above require us to invoke at least some form of subjective (Bayesian) prior knowledge. This sits uncomfortably with the understandable but perhaps impossible desire of many medical publications that at least all the statistical hypothesis testing has to be classical non‐Bayesian—i.e. it is not permitted to use any (subjective) prior knowledge.

[1]  David L. Dowe,et al.  Message Length Formulation of Support Vector Machines for Binary Classification - A Preliminary Scheme , 2002, Australian Joint Conference on Artificial Intelligence.

[2]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[3]  M. Prior,et al.  Are there subgroups within the autistic spectrum? A cluster analysis of a group of children with autistic spectrum disorders. , 1998, Journal of child psychology and psychiatry, and allied disciplines.

[4]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[5]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  C. S. Wallace,et al.  MML mixture modelling of multi-state, Poisson, von Mises circular and Gaussian distributions , 1997 .

[8]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[9]  Leigh J. Fitzgibbon,et al.  Minimum message length autoregressive model order selection , 2004, International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of.

[10]  Trevor I. Dix,et al.  Building Classification Models from Microarray Data with Tree-Based Classification Algorithms , 2007, Australian Conference on Artificial Intelligence.

[11]  C. S. Wallace,et al.  Constructing a Minimal Diagnostic Decision Tree , 1993, Methods of Information in Medicine.

[12]  David L. Dowe,et al.  Refinements of MDL and MML Coding , 1999, Comput. J..

[13]  David L. Dowe,et al.  A Non-Behavioural, Computational Extension to the Turing Test , 1998 .

[14]  David L. Dowe,et al.  Message Length as an Effective Ockham's Razor in Decision Tree Induction , 2001, International Conference on Artificial Intelligence and Statistics.

[15]  J. Neyman,et al.  Consistent Estimates Based on Partially Consistent Observations , 1948 .

[16]  Malcolm R. Forster,et al.  How to Tell When Simpler, More Unified, or Less Ad Hoc Theories will Provide More Accurate Predictions , 1994, The British Journal for the Philosophy of Science.

[17]  D L Dowe,et al.  The Melbourne Family Grief Study, I: Perceptions of family functioning in bereavement. , 1996, The American journal of psychiatry.

[18]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[19]  R. Wolpert,et al.  Likelihood Principle , 2022, The SAGE Encyclopedia of Research Design.

[20]  David L. Dowe,et al.  MML Inference of Oblique Decision Trees , 2004, Australian Conference on Artificial Intelligence.

[21]  David L. Dowe,et al.  MML Inference of Decision Graphs with Multi-way Joins and Dynamic Attributes , 2003, Australian Conference on Artificial Intelligence.

[22]  David L. Dowe,et al.  Inferring phylogenetic graphs of natural languages using minimum message length , 2005 .

[23]  David L. Dowe,et al.  Minimum message length and generalized Bayesian nets with asymmetric languages , 2005 .

[24]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[25]  David L. Dowe,et al.  Universal Bayesian inference , 2001 .

[26]  David L. Dowe,et al.  MML Inference of Decision Graphs with Multi-way Joins and Dynamic Attributes , 2002, Australian Conference on Artificial Intelligence.

[27]  Ray J. Solomonoff,et al.  A Formal Theory of Inductive Inference. Part I , 1964, Inf. Control..

[28]  Shane Legg,et al.  Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.

[29]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[30]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[31]  David L. Dowe,et al.  Decision Forests with Oblique Decision Trees , 2006, MICAI.

[32]  David L. Dowe,et al.  Minimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[33]  C. S. Wallace,et al.  Circular clustering of protein dihedral angles by Minimum Message Length. , 1996, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[34]  David L. Dowe,et al.  Bayes not Bust! Why Simplicity is no Problem for Bayesians1 , 2007, The British Journal for the Philosophy of Science.

[35]  D. Dowe,et al.  An MML classification of protein structure that knows about angles and sequence. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[36]  David L. Dowe,et al.  Minimum Message Length and Kolmogorov Complexity , 1999, Comput. J..

[37]  David L. Dowe,et al.  Intrinsic classification by MML - the Snob program , 1994 .

[38]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[39]  Itamar Arel,et al.  Beyond the Turing Test , 2009, Computer.

[40]  C. S. Wallace,et al.  Bayesian Estimation of the Von Mises Concentration Parameter , 1996 .

[41]  Alexander Gammerman,et al.  Hedging predictions in machine learning , 2006, ArXiv.

[42]  C. S. Wallace,et al.  Resolving the Neyman-Scott problem by minimum message length , 1997 .

[43]  D M Boulton,et al.  The Classification of Depression by Numerical Taxonomy , 1969, British Journal of Psychiatry.

[44]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[45]  C. S. Wallace,et al.  Single-factor analysis by minimum message length estimation , 1992 .

[46]  David L. Dowe,et al.  General Bayesian networks and asymmetric languages , 2003 .

[47]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[48]  C. S. Wallace,et al.  Intrinsic Classification of Spatially Correlated Data , 1998, Comput. J..

[49]  D L Dowe,et al.  The Melbourne Family Grief Study, II: Psychosocial morbidity and grief in bereaved families. , 1996, The American journal of psychiatry.

[50]  David L. Dowe,et al.  MML Estimation of the Parameters of the Sherical Fisher Distribution , 1996, ALT.

[51]  H. Akaike Factor analysis and AIC , 1987 .

[52]  David L. Dowe Discussion on hedging predictions in machine learning by A Gammerman and V Vovk , 2007 .

[53]  David L. Dowe,et al.  Foreword re C. S. Wallace , 2008, Comput. J..

[54]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[55]  David L. Dowe,et al.  A Preliminary MML Linear Classifier Using Principal Components for Multiple Classes , 2005, Australian Conference on Artificial Intelligence.

[56]  Dean P McKenzie,et al.  An empirically derived taxonomy of common distress syndromes in the medically ill. , 2003, Journal of psychosomatic research.

[57]  David L. Dowe,et al.  Point Estimation Using the Kullback-Leibler Loss Function and MML , 1998, PAKDD.

[58]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[59]  David L. Dowe,et al.  Single Factor Analysis in MML Mixture Modelling , 1998, PAKDD.

[60]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[61]  John Langford,et al.  Suboptimal behavior of Bayes and MDL in classification under misspecification , 2004, Machine Learning.