Model selection using Minimal Message Length: an example using pollen data

In this paper we examine the use of the minimum message length criterion in the process of evaluating alternative models of data when the samples are serially ordered in space and implicitly in time. Much data from vegetation studies can be arranged in a sequence and in such cases the user may elect to constrain the clustering by zones, in preference to an unconstrained clustering. We use the minimum message length principle to determine if such a choice provides an effective model of the data. Pollen data provide a suitably organised set of samples, but have other properties which make it desirable to examine several different models for the distribution of palynomorphs within the clusters. The results suggest that zonation is not a particularly preferred model since it captures only a small part of the patterns present. It represents a user expectation regarding the nature of variation in the data and results in some patterns being neglected. By using unconstrained clustering within zones, we can recover some of this overlooked pattern. We then examine other evidence for the nature of change in vegetation and finally discuss the usefulness of the minimum message length as a guiding principle in model choice and its relationship to other possible criteria.

[1]  S. Sugita Pollen Representation of Vegetation in Quaternary Sediments: Theory and Method in Patchy Vegetation , 1994 .

[2]  Nicholas R. Jennings,et al.  An improved dynamic programming algorithm for coalition structure generation , 2008, AAMAS.

[3]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[4]  V. Markgraf Late and postglacial vegetational and paleoclimatic changes in subantarctic, temperate and arid environments in Argentina , 1983 .

[5]  H. Joosten,et al.  In search of finiteness: the limits of fine-resolution palynology of Sphagnum peat , 2007 .

[6]  Lisa Hellerstein,et al.  Learning in the presence of finitely or infinitely many irrelevant attributes , 1991, COLT '91.

[7]  Jing Lu,et al.  Creating ensembles of classifiers via fuzzy clustering and deflection , 2010, Fuzzy Sets Syst..

[8]  K. He,et al.  On governance in the long-term vegetation process: How to discover the rules? , 2009, Frontiers of Biology in China.

[9]  Vijay Balasubramanian,et al.  Statistical Inference, Occam's Razor, and Statistical Mechanics on the Space of Probability Distributions , 1996, Neural Computation.

[10]  G. N. Lance,et al.  Studies in the Numerical Analysis of Complex Rain-Forest Communities: I. A Comparison of Methods Applicable to Site/Species Data , 1967 .

[11]  David L. Dowe,et al.  Foreword re C. S. Wallace , 2008, Comput. J..

[12]  Chris S. Wallace,et al.  A Program for Numerical Classification , 1970, Comput. J..

[13]  Sandy P. Harrison,et al.  Pollen‐based reconstructions of biome distributions for Australia, Southeast Asia and the Pacific (SEAPAC region) at 0, 6000 and 18,000 14C yr BP , 2004 .

[14]  M. B. Dale,et al.  Mt Glorious revisited: Secondary succession in subtropical rainforest , 2001 .

[15]  Stephen T. Jackson,et al.  MODERN ANALOGS IN QUATERNARY PALEOECOLOGY: Here Today, Gone Yesterday, Gone Tomorrow? , 2004 .

[16]  Drew W. Purves,et al.  Uniting pattern and process in plant ecology , 2001 .

[17]  James P. Crutchfield,et al.  Computational Mechanics: Pattern and Prediction, Structure and Simplicity , 1999, ArXiv.

[18]  M. Caffee,et al.  Evidence of early Holocene glacial advances in southern South America from cosmogenic surface-exposure dating , 2004 .

[19]  C. S. Wallace,et al.  Hierarchical Clusters of Vegetation Types. , 2005 .

[20]  S. Sugita,et al.  Theory of quantitative reconstruction of vegetation II: all you need is LOVE , 2007 .

[21]  P. Moreno,et al.  Pollen evidence for variations in the southern margin of the westerly winds in SW patagonia over the last 12,600 years , 2007, Quaternary Research.

[22]  David L. Dowe,et al.  Enhancing MML Clustering Using Context Data with Climate Applications , 2009, Australasian Conference on Artificial Intelligence.

[23]  Xindong Wu,et al.  A Study of Causal Discovery With Weak Links and Small Samples , 1997, IJCAI.

[24]  Jonathan J. Oliver,et al.  The Kindest Cut: Minimum Message Length Segmentation , 1996, ALT.

[25]  M. Dale Changes in the model of within-cluster distribution of attributes and their effects on cluster analysis of vegetation data , 2007 .

[26]  S. Sugita A Model of Pollen Source Area for an Entire Lake Surface , 1993, Quaternary Research.

[27]  W. T. Williams THE PROBLEM OF ATTRIBUTE‐WEIGHTING IN NUMERICAL CLASSIFICATION , 1969 .

[28]  W. T. Williams,et al.  Partition Correlation Matrices for Heterogeneous Quantitative Data , 1962, Nature.

[29]  Trevor I. Dix,et al.  Compression of Strings with Approximate Repeats , 1998, ISMB.

[30]  Yair M. Babad,et al.  Even no data has a value , 1984, CACM.

[31]  M. Dale,et al.  Supervised clustering using decision trees and decision graphs: An ecological comparison , 2007 .

[32]  V. Grimm Ten years of individual-based modelling in ecology: what have we learned and what could we learn in the future? , 1999 .

[33]  Ashwin Srinivasan,et al.  Learning Qualitative Models of Physical and Biological Systems , 2007, Computational Discovery of Scientific Knowledge.

[34]  J. Aitchison,et al.  Possible solution of some essential zero problems in compositional data analysis , 2003 .

[35]  Lloyd Allison,et al.  Minimum Message Length Grouping of Ordered Data , 2000, ALT.

[36]  Huaiyu Zhu,et al.  Bayesian invariant measurements of generalisation for continuous distributions , 1995 .

[37]  Patricia Ellen Dale,et al.  Optimal classification to describe environmental change: pictures from the exposition , 2002 .

[38]  Young,et al.  Inferring statistical complexity. , 1989, Physical review letters.

[39]  Ashwin Srinivasan,et al.  The Justification of Logical Theories based on Data Compression , 1994, Machine Intelligence 13.

[40]  L. Orlóci,et al.  Multiscale analysis of palynological records: new possibilities , 2006 .

[41]  Paul Thagard,et al.  The Best Explanation: Criteria for Theory Choice , 1978 .

[42]  David L. Dowe,et al.  Minimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[43]  I. J. Myung,et al.  Counting probability distributions: Differential geometry and model selection , 2000, Proc. Natl. Acad. Sci. USA.

[44]  M. J. Bunting,et al.  Equifinality and uncertainty in the interpretation of pollen data: the Multiple Scenario Approach to reconstruction of past vegetation mosaics , 2009 .

[45]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[46]  M. B. Dale,et al.  On the effectiveness of higher taxonomic ranks for vegetation analysis , 1976 .

[47]  Hrishikesh D. Vinod Mathematica Integer Programming and the Theory of Grouping , 1969 .

[48]  Jianbo Shi,et al.  Segmentation given partial grouping constraints , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Howard J. Hamilton,et al.  Heuristic Measures of Interestingness , 1999, PKDD.

[50]  B Huntley,et al.  Reconstructing biomes from palaeoecological data: a general method and its application to European pollen data at 0 and 6 ka , 1996 .

[51]  David L. Dowe,et al.  Minimum message length and generalized Bayesian nets with asymmetric languages , 2005 .

[52]  Mike Dale,et al.  Building Models of Ecological Dynamics Using HMM Based Temporal Data Clustering - A Preliminary Study , 2001, IDA.

[53]  Jorma Rissanen Stochastic complexity in learning , 1995, EuroCOLT.

[54]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[55]  Gediminas Adomavicius,et al.  Discovery of Actionable Patterns in Databases: the Action Hierarchy Approach , 1997, KDD.

[56]  Lloyd Allison,et al.  Univariate Polynomial Inference by Monte Carlo Message Length Approximation , 2002, ICML.

[57]  Linden J. Ball,et al.  Does Positivity Bias Explain Patterns of Performance on Wason’s 2-4-6 Task? , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[58]  Ian Davidson,et al.  An Information Theoretic Optimal Classifier for Semi-supervised Learning , 2004, IDEAL.

[59]  Stephen Jose Hanson,et al.  CONCEPTUAL CLUSTERING AND CATEGORIZATION , 1990 .

[60]  Martin T. Sykes,et al.  Small-scale plant species turnover in a limestone grassland: the carousel model and some comments on the niche concept. , 1993 .

[61]  Serafino Amoroso,et al.  Structural and Behavioral Equivalences of Tessellation Automata , 1971, Inf. Control..

[62]  D. A. Walker The late quaternary history of the Cumberland lowland , 1966, Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences.

[63]  Seth Lloyd,et al.  Information measures, effective complexity, and total information , 1996 .

[64]  Lloyd Allison,et al.  Segmentation and clustering as complementary sources of information , 2007 .

[65]  S. Stutz,et al.  Modern pollen–vegetation and isopoll maps in southern Argentina , 2001 .

[66]  Alan A. Berryman,et al.  On Choosing Models for Describing and Analyzing Ecological Time Series , 1992 .

[67]  K. Popper,et al.  The Logic of Scientific Discovery , 1960 .

[68]  Steven L. Salzberg Pinpointing good hypotheses with heuristics , 1986 .

[69]  C. S. Wallace,et al.  Minimum Message Length Segmentation , 1998, PAKDD.

[70]  R. Bradshaw Quantitative reconstruction of local woodland vegetation using pollen analylsis from a small basin in Norfolk, England , 1981 .

[71]  L. V. Post Ur de sydsrenska skogarnas regionala historia under post-arktisk tid , 1924 .

[72]  David L. Dowe,et al.  Message Length as an Effective Ockham's Razor in Decision Tree Induction , 2001, International Conference on Artificial Intelligence and Statistics.

[73]  Leigh J. Fitzgibbon,et al.  Minimum message length autoregressive model order selection , 2004, International Conference on Intelligent Sensing and Information Processing, 2004. Proceedings of.

[74]  L. Allison,et al.  A model for correlation within clusters and its use in pollen analysis. , 2010 .

[75]  Tianzi Jiang,et al.  Pixon-based image segmentation with Markov random fields , 2003, IEEE Trans. Image Process..

[76]  László Orlóci,et al.  Multi-scale trajectory analysis: powerful conceptual tool for understanding ecological change , 2009, Frontiers of Biology in China.

[77]  Vedat Akgiray,et al.  Estimation of Stable-Law Parameters: A Comparative Study , 1989 .

[78]  Kai Ming Ting,et al.  Model-based clustering of sequential data , 2006 .

[79]  S. S. Ravi,et al.  Efficient incremental constrained clustering , 2007, KDD '07.

[80]  A. Lanterman Schwarz, Wallace, and Rissanen: Intertwining Themes in Theories of Model Selection , 2001 .

[81]  B. Riddle,et al.  Species as units of analysis in ecology and biogeography: time to take the blinders off , 1999 .

[82]  S. R. Wilson,et al.  A Statistical Alternative to the Zoning of Pollen Diagrams , 1978 .

[83]  C. S. Wallace,et al.  Intrinsic Classification of Spatially Correlated Data , 1998, Comput. J..

[84]  M. Peters,et al.  Late-glacial and Holocene vegetation history of the Magellanic rain forest in southwestern Patagonia, Chile , 2004 .

[85]  W. T. Williams,et al.  Principles of Clustering , 1971 .

[86]  I. Prentice Pollen Representation, Source Area, and Basin Size: Toward a Unified Theory of Pollen Analysis , 1985, Quaternary Research.

[87]  J. Gower Maximal predictive classification , 1974 .

[88]  A. Kershaw A LATE PLEISTOCENE AND HOLOCENE POLLEN DIAGRAM FROM LYNCH'S CRATER, NORTHEASTERN QUEENSLAND, AUSTRALIA , 1976 .

[89]  Aditya K. Ghose,et al.  A best-first anytime algorithm for computing optimal coalition structures , 2008, AAMAS.

[90]  Marcus Hutter,et al.  Consistency of Feature Markov Processes , 2010, ALT.

[91]  K. R. W. Brewer,et al.  The use of gradient directed transects or gradsects in natural resource surveys , 1985 .

[92]  David L. Dowe,et al.  Minimum Message Length and Statistically Consistent Invariant (Objective?) Bayesian Probabilistic Inference—From (Medical) “Evidence” , 2008 .

[93]  Elliott Sober,et al.  Explanation and its Limits: Let's Razor Ockham's Razor , 1991 .

[94]  R. M. Nally Regression and model-building in conservation biology, biogeography and ecology: The distinction between – and reconciliation of – ‘predictive’ and ‘explanatory’ models , 2000, Biodiversity & Conservation.

[95]  D. G. Green,et al.  Fire and stability in the postglacial forests of southwest Nova Scotia , 1982 .

[96]  S. Sugita,et al.  Theory of quantitative reconstruction of vegetation I: pollen from large sites REVEALS regional vegetation composition , 2007 .

[97]  Lloyd Allison,et al.  Bayesian posterior comprehension via Message from Monte Carlo , 2003 .

[98]  Juan M. C. Larrosa Compositional Time Series: Past and Present , 2005 .

[99]  Ray J. Solomonoff,et al.  Three Kinds of Probabilistic Induction: Universal Distributions and Convergence Theorems , 2008, Comput. J..

[100]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.