Margin-maximizing classification of sequential data with infinitely-long temporal dependencies

Generative models for sequential data are usually based on the assumption of temporal dependencies described by a first-order Markov chain. To ameliorate this shallow modeling assumption, several authors have proposed models with higher-order dependencies. However, the practical applicability of these approaches is hindered by their prohibitive computational costs in most cases. In addition, most existing approaches give rise to model training algorithms with objective functions that entail multiple spurious local optima, thus requiring application of tedious countermeasures to avoid getting trapped to bad model estimates. In this paper, we devise a novel margin-maximizing model with convex objective function that allows for capturing infinitely-long temporal dependencies in sequential datasets. This is effected by utilizing a recently proposed nonparametric Bayesian model of label sequences with infinitely-long temporal dependencies, namely the sequence memoizer, and training our model using margin maximization and a versatile mean-field-like approximation to allow for increased computational efficiency. As we experimentally demonstrate, the devised margin-maximizing construction of our model, which leads to a convex optimization scheme, without any spurious local optima, combined with the capacity of our model to capture long and complex temporal dependencies, allow for obtaining exceptional pattern recognition performance in several applications.

[1]  Jun Zhang,et al.  The mean field theory in EM procedures for blind Markov random field image restoration , 1993, IEEE Trans. Image Process..

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Eric Moulines,et al.  Inference in Hidden Markov Models (Springer Series in Statistics) , 2005 .

[4]  D. Chandler,et al.  Introduction To Modern Statistical Mechanics , 1987 .

[5]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Yee Whye Teh,et al.  Improvements to the Sequence Memoizer , 2010, NIPS.

[7]  Dan Wu,et al.  Conditional Random Fields with High-Order Features for Sequence Labeling , 2009, NIPS.

[8]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[9]  Josiane Zerubia,et al.  Mean field approximation using compound Gauss-Markov random field for edge detection and image restoration , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Yi Ding,et al.  Segmental Hidden Markov Models for View-based Sport Video Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  S. P. Luttrell,et al.  The Use of Bayesian and Entropic Methods in Neural Network Theory , 1989 .

[12]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[13]  Michael I. Jordan,et al.  Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[14]  Jianping Zeng,et al.  A new distance measure for hidden Markov models , 2010, Expert Syst. Appl..

[15]  Yao Zhao,et al.  TSVM-HMM: Transductive SVM based hidden Markov model for automatic image annotation , 2009, Expert Syst. Appl..

[16]  Johan A. du Preez,et al.  Efficient backward decoding of high-order hidden Markov models , 2010, Pattern Recognit..

[17]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[18]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[19]  Lawrence K. Saul,et al.  Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  Svetha Venkatesh,et al.  Qualitative estimation of camera motion parameters from video sequences , 1997, Pattern Recognition.

[21]  Richard Washington,et al.  Learning to Automatically Detect Features for Mobile Robots Using Second-Order Hidden Markov Models , 2003, IJCAI 2003.

[22]  Jean-François Mari,et al.  A second-order HMM for high performance word and phoneme-based continuous speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[23]  Sotirios Chatzis,et al.  The infinite Hidden Markov random field model , 2009, ICCV.

[24]  Federico Girosi,et al.  Parallel and Deterministic Algorithms from MRFs: Surface Reconstruction , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Ben M. Herbst,et al.  Estimating the pen trajectories of static signatures using hidden Markov models , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Abdelaziz Kriouile,et al.  Automatic word recognition based on second-order hidden Markov models , 1994, IEEE Trans. Speech Audio Process..

[27]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[28]  Yee Whye Teh,et al.  A stochastic memoizer for sequence data , 2009, ICML '09.

[29]  Sotirios Chatzis,et al.  Robust Sequential Data Modeling Using an Outlier Tolerant Hidden Markov Model , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Alan L. Yuille,et al.  Generalized Deformable Models, Statistical Physics, and Matching Problems , 1990, Neural Computation.

[31]  Jun Zhang,et al.  The Mean Field Theory In EM Procedures For Markov Random Fields , 1991, Proceedings of the Seventh Workshop on Multidimensional Signal Processing.

[32]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[33]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[34]  Sotirios Chatzis,et al.  A Fuzzy Clustering Approach Toward Hidden Markov Random Field Models for Enhanced Spatially Constrained Image Segmentation , 2008, IEEE Transactions on Fuzzy Systems.

[35]  Frank D. Wood,et al.  The sequence memoizer , 2011, Commun. ACM.

[36]  Gilles Celeux,et al.  EM procedures using mean field-like approximations for Markov model-based image segmentation , 2003, Pattern Recognit..

[37]  Yiannis Demiris,et al.  The echo state conditional random field model for sequential data modeling , 2012, Expert Syst. Appl..

[38]  P. Deb Finite Mixture Models , 2008 .