Probabilistic Sequence Translation-Alignment Model for Time-Series Classification

We tackle the time-series classification problem using a novel probabilistic model that represents the conditional densities of the observed sequences being time-warped and transformed from an underlying base sequence. We call it probabilistic sequence translation-alignment model (PSTAM) since it aims to capture both feature alignment and mapping between sequences, analogous to translating one language into another in the field of machine translation. To deal with general time-series, we impose the time-monotonicity constraints on the hidden alignment variables in the model parameter space, where by marginalizing them out it allows effective learning of class-specific time-warping and feature transformation simultaneously. Our PSTAM, thus, naturally enjoys the advantages from two typical approaches widely used in sequence classification: 1) benefits from the alignment-based methods that aim to estimate distance measures between non-equal-length sequences via direct comparison of aligned features, and 2) merits of the model-based approaches that can effectively capture the class-specific patterns or trends. Furthermore, the low-dimensional modeling of the latent base sequence naturally provides a way to discover the intrinsic manifold structure possibly retained in the observed data, leading to an unsupervised manifold learning for sequence data. The benefits of the proposed approach are demonstrated on a comprehensive set of evaluations with both synthetic and real-world sequence data sets.

[1]  S. Gong,et al.  Conditional Mutual Information Based Boosting for Facial Expression Recognition , 2005 .

[2]  Rui Li,et al.  Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[3]  Takeo Kanade,et al.  Detection, tracking, and classification of action units in facial expression , 2000, Robotics Auton. Syst..

[4]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[6]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[7]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[8]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[9]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Franz Pernkopf,et al.  Discriminative versus generative parameter and structure learning of Bayesian network classifiers , 2005, ICML.

[11]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[12]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[13]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[14]  Olufemi A. Omitaomu,et al.  Weighted dynamic time warping for time series classification , 2011, Pattern Recognit..

[15]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[16]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.

[17]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[18]  Nuno Vasconcelos,et al.  Probabilistic kernels for the classification of auto-regressive visual processes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Zhenyu He,et al.  Writer identification of Chinese handwriting documents using hidden Markov tree model , 2008, Pattern Recognit..

[20]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[21]  David Haussler,et al.  Using the Fisher Kernel Method to Detect Remote Protein Homologies , 1999, ISMB.

[22]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[23]  Hui Zhang,et al.  A Non-parametric Wavelet Feature Extractor for Time Series Classification , 2004, PAKDD.

[24]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[25]  Robert P. W. Duin,et al.  Component-based discriminative classification for hidden Markov models , 2009, Pattern Recognit..

[26]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[27]  Vladimir Pavlovic,et al.  Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[29]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[30]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[31]  Qingshan Liu,et al.  RankBoost with l1 regularization for facial expression recognition and intensity estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[32]  Vladimir Pavlovic,et al.  Hidden Conditional Ordinal Random Fields for Sequence Classification , 2010, ECML/PKDD.

[33]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[34]  Trevor Darrell,et al.  Conditional Random Fields for Object Recognition , 2004, NIPS.