Finite Mixture Modeling of Gaussian Regression Time Series with Application to Dendrochronology

Finite mixture modeling is a popular statistical technique capable of accounting for various shapes in data. One popular application of mixture models is model-based clustering. This paper considers the problem of clustering regression autoregressive moving average time series. Two novel estimation procedures for the considered framework are developed. The first one yields the conditional maximum likelihood estimates which can be used in cases when the length of times series is substantial. Simple analytical expressions make fast parameter estimation possible. The second method incorporates the Kalman filter and yields the exact maximum likelihood estimates. The procedure for assessing variability in obtained estimates is discussed. We also show that the Bayesian information criterion can be successfully used to choose the optimal number of mixture components and correctly assess time series orders. The performance of the developed methodology is evaluated on simulation studies. An application to the analysis of tree ring data is thoroughly considered. The results are very promising as the proposed approach overcomes the limitations of other methods developed so far.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  Ranjan Maitra Initializing Partition-Optimization Algorithms , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Jiahua Chen,et al.  Variable Selection in Finite Mixture of Regression Models , 2007 .

[4]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter. , 1991 .

[5]  W. Li,et al.  On a mixture autoregressive model , 2000 .

[6]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[7]  K. Sohar,et al.  Sapwood estimates of pedunculate oak (Quercus robur L.) in eastern Baltic , 2012 .

[8]  D. Goldfarb A family of variable-metric methods derived by variational means , 1970 .

[9]  David S. Stoffer,et al.  Time series analysis and its applications , 2000 .

[10]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[11]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[12]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[13]  Wei-Chen Chen,et al.  Model‐based clustering of regression time series data via APECM—an AECM algorithm sung to an even faster beat , 2011, Stat. Anal. Data Min..

[14]  Edward R. Cook,et al.  Low-Frequency Signals in Long Tree-Ring Chronologies for Reconstructing Past Temperature Variability , 2002, Science.

[15]  Dit-Yan Yeung,et al.  Time series clustering with ARMA mixtures , 2004, Pattern Recognit..

[16]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[17]  Andrew Harvey,et al.  Maximum likelihood estimation of regression models with autoregressive-moving average disturbances , 1979 .

[18]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[19]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[20]  R. Engle,et al.  Alternative Algorithms for the Estimation of Dynamic Factor , 1983 .

[21]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[22]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .

[23]  R. Fletcher,et al.  A New Approach to Variable Metric Algorithms , 1970, Comput. J..

[24]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[25]  N. Martinelli Climate from dendrochronology: latest developments and results , 2004 .

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[28]  K. Hanecaa,et al.  Provenancing Baltic timber from art historical objects: success and limitations , 2004 .

[29]  Volodymyr Melnykov,et al.  Recent Developments in Model-Based Clustering with Applications , 2015 .

[30]  H. Akaike A new look at the statistical model identification , 1974 .

[31]  Larry Nazareth,et al.  A family of variable metric updates , 1977, Math. Program..

[32]  M. Bridge Locating the origins of wood resources: a review of dendroprovenancing , 2012 .

[33]  Chellu Chandra Sekhar,et al.  Bayesian Mixture of AR Models for Time Series Clustering , 2009, ICAPR.

[34]  Harold C. Fritts,et al.  The International Tree-Ring Data Bank: an enhanced global database serving the global scientific community , 1997 .

[35]  김경민,et al.  Finite mixture models and model-based clustering , 2017 .

[36]  Volodymyr Melnykov,et al.  Efficient estimation in model‐based clustering of Gaussian regression time series , 2012, Stat. Anal. Data Min..

[37]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[38]  Volodymyr Melnykov,et al.  Initializing the EM algorithm in Gaussian mixture models with an unknown number of components , 2012, Comput. Stat. Data Anal..

[39]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[40]  F. Leisch FlexMix: A general framework for finite mixture models and latent class regression in R , 2004 .