Predictive modular neural networks for unsupervised segmentation of switching time series: the data allocation problem

In this paper, we explore some aspects of the problem of online unsupervised learning of a switching time series, i.e., a time series which is generated by a combination of several alternately activated sources. This learning problem can be solved by a two-stage approach: 1) separating and assigning each incoming datum to a specific dataset (one dataset corresponding to each source) and 2) developing one model per dataset (i.e., one model per source). We introduce a general data allocation (DA) methodology, which combines the two steps into an iterative scheme: existing models compete for the incoming data; data assigned to each model are used to refine the model. We distinguish between two modes of DA: in parallel DA, every incoming datablock is allocated to the model with lowest prediction error; in serial DA, the incoming datablock is allocated to the first model with prediction error below a prespecified threshold. We present sufficient conditions for asymptotically correct allocation of the data. We also present numerical experiments to support our theoretical analysis.

[1]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[2]  S. Ishii,et al.  Reconstruction of chaotic dynamics and robustness to noise with on-line EM algorithm , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[3]  Frank A. Monforte,et al.  Predictive Modular Neural Networks - Applications to Time Series , 2002 .

[4]  Athanasios Kehagias,et al.  A Recurrent Network Implementation of Time Series Classification , 1996, Neural Computation.

[5]  Patrick Kenny,et al.  Modeling acoustic transitions in speech by state-interpolation hidden Markov models , 1992, IEEE Trans. Signal Process..

[6]  Geoffrey E. Hinton,et al.  Variational Learning for Switching State-Space Models , 2000, Neural Computation.

[7]  Masa-aki Sato,et al.  Online Model Selection Based on the Variational Bayes , 2001, Neural Computation.

[8]  Giovanni Soda,et al.  Bidirectional Dynamics for Protein Secondary Structure Prediction , 2001, Sequence Learning.

[9]  Joachim M. Buhmann,et al.  Topology free hidden Markov models: application to background modeling , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[11]  John B. Moore,et al.  Hidden Markov Models: Estimation and Control , 1994 .

[12]  R. Kohn,et al.  On Gibbs sampling for state space models , 1994 .

[13]  Xavier Boyen,et al.  Approximate Learning of Dynamic Models , 1998, NIPS.

[14]  Philip Hans Franses,et al.  Time Series Models for Business and Economic Forecasting , 1998 .

[15]  T. Johansen,et al.  Identification of non-linear system structure and parameters using regime decomposition , 1994, Autom..

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Athanasios Kehagias,et al.  Time-Series Segmentation Using Predictive Modular Neural Networks , 1997, Neural Computation.

[18]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[19]  Demetrios G. Lainiotis,et al.  Optimal Estimation in the Presence of Unknown Parameters , 1969, IEEE Trans. Syst. Sci. Cybern..

[20]  Vassilios Petridis,et al.  Predictive Modular Neural Networks: Applications to Time Series , 1998 .

[21]  Kumpati S. Narendra,et al.  Adaptation and learning using multiple models, switching, and tuning , 1995 .

[22]  B.-H. Juang,et al.  Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains , 1985, AT&T Technical Journal.

[23]  Yoshua Bengio,et al.  Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.

[24]  Lei Xu,et al.  RBF nets, mixture experts, and Bayesian Ying-Yang learning , 1998, Neurocomputing.

[25]  K. Fu,et al.  On state estimation in switching environments , 1968 .

[26]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[27]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[28]  Athanasios Kehagias,et al.  Modular neural networks for MAP classification of time series and the partition algorithm , 1996, IEEE Trans. Neural Networks.

[29]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[30]  Pierre Baldi,et al.  Smooth On-Line Learning Algorithms for Hidden Markov Models , 1994, Neural Computation.

[31]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[32]  Michael P. Clements,et al.  A Comparison of the Forecast Performance of Markov�?Switching and Threshold Autoregressive Models of Us Gnp , 1998 .

[33]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[34]  Patrick Kenny,et al.  A linear predictive HMM for vector-valued observations with applications to speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[35]  H. Krolzig Markov-Switching Vector Autoregressions , 1997 .

[36]  James D. Hamilton Analysis of time series subject to changes in regime , 1990 .

[37]  Mauricio G. C. Resende,et al.  Piecewise Linear Time Series Estimation with GRASP , 2001, Comput. Optim. Appl..

[38]  Y. Bar-Shalom,et al.  Multiple-model estimation with variable structure , 1996, IEEE Trans. Autom. Control..

[39]  Zoubin Ghahramani,et al.  Learning Nonlinear Dynamical Systems Using an EM Algorithm , 1998, NIPS.

[40]  Michael O. Kolawole,et al.  Estimation and tracking , 2002 .

[41]  Steven J. Nowlan,et al.  Mixtures of Controllers for Jump Linear and Non-Linear Plants , 1993, NIPS.

[42]  A. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[43]  Klaus-Robert Müller,et al.  Annealed Competition of Experts for a Segmentation and Classification of Switching Dynamics , 1996, Neural Computation.

[44]  R. Tsay Testing and modeling multivariate threshold models , 1998 .

[45]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[46]  Andreas S. Weigend,et al.  Predicting Daily Probability Distributions of S&P500 Returns , 1998 .

[47]  Patrick Billingsley,et al.  Probability and Measure. , 1986 .

[48]  H. Tong Non-linear time series. A dynamical system approach , 1990 .

[49]  F. Dufour,et al.  Stabilizing control law for hybrid models , 1994, IEEE Trans. Autom. Control..

[50]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[51]  Athanasios Kehagias,et al.  Predictive Modular Neural Networks for Time Series Classification , 1997, Neural Networks.

[52]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[53]  Mauro J. Caputi,et al.  A necessary condition for effective performance of the multiple model adaptive estimator , 1995 .

[54]  P. Billingsley,et al.  Probability and Measure , 1980 .

[55]  Kumpati S. Narendra,et al.  Adaptive control using multiple models , 1997, IEEE Trans. Autom. Control..

[56]  Lei Xu,et al.  Temporal BYY learning for state space approach, hidden Markov model, and blind source separation , 2000, IEEE Trans. Signal Process..

[57]  Klaus-Robert Müller,et al.  Identification of nonstationary dynamics in physiological recordings , 2000, Biological Cybernetics.

[58]  François Dufour,et al.  The filtering problem for continuous-time linear systems with Markovian switching coefficients , 1994 .

[59]  Timo Teräsvirta,et al.  Smooth transition autoregressive models - A survey of recent developments , 2000 .

[60]  D. Lainiotis Optimal adaptive estimation: Structure and parameter adaption , 1971 .

[61]  Kumpati S. Narendra,et al.  Improving transient response of adaptive control systems using multiple models and switching , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[62]  Li Deng,et al.  A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal , 1992, Signal Process..