A Bayesian Hybrid Approach to Unsupervised Time Series Discretization

Discretization is a key preprocessing step in knowledge discovery to make raw time series data applicable to symbolic data mining algorithms. To improve the comprehensibility of the mined results, or to help the induction step of the mining algorithms, in discretization, it is natural to prefer having discrete levels which can be mapped into intuitive symbols. In this paper, we aim to make smoothing of the data points along with the time axis, and make binning or clustering at the measurement axis. In particular, we propose a hybrid discretization method based on variational Bayes, in which the output of one discretization method is smoothly exploited as hyper parameters of another probabilistic discretization model such as a continuous hidden Markov model. The experiments with artificial and real datasets exhibit the usefulness of this hybrid approach.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[3]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[4]  Pierre Geurts,et al.  Pattern Extraction for Time Series Classification , 2001, PKDD.

[5]  Naonori Ueda,et al.  Application of Variational Bayesian Approach to Speech Recognition , 2002, NIPS.

[6]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[7]  C. Finney,et al.  A review of symbolic analysis of experimental data , 2003 .

[8]  Fabian Mörchen,et al.  Extracting interpretable muscle activation patterns with time series knowledge mining , 2005, Int. J. Knowl. Based Intell. Eng. Syst..

[9]  Fabian Mörchen,et al.  Optimizing time series discretization for knowledge discovery , 2005, KDD '05.

[10]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[11]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[12]  Ross D. King,et al.  On the use of qualitative reasoning to simulate and identify metabolic pathway , 2005, Bioinform..

[13]  Lawrence Carin,et al.  Variational Bayes for continuous hidden Markov models and its application to active learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[15]  M. O. Abu-Shawie,et al.  Confidence Interval for the Mean of a Contaminated Normal Distribution , 2009 .