Max-Margin Infinite Hidden Markov Models

Infinite hidden Markov models (iHMMs) are nonparametric Bayesian extensions of hidden Markov models (HMMs) with an infinite number of states. Though flexible in describing sequential data, the generative formulation of iHMMs could limit their discriminative ability in sequential prediction tasks. Our paper introduces max-margin infinite HMMs (M2iHMMs), new infinite HMMs that explore the max-margin principle for discriminative learning. By using the theory of Gibbs classifiers and data augmentation, we develop efficient beam sampling algorithms without making restricting mean-field assumptions or truncated approximation. For single variate classification, M2iHMMs reduce to a new formulation of DP mixtures of max-margin machines. Empirical results on synthetic and real data sets show that our methods obtain superior performance than other competitors in both single variate classification and sequential prediction tasks.

[1]  Babak Shahbaba,et al.  Nonlinear Models Using Dirichlet Process Mixtures , 2007, J. Mach. Learn. Res..

[2]  Ning Chen,et al.  Gibbs max-margin topic models with data augmentation , 2013, J. Mach. Learn. Res..

[3]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[4]  Sotirios Chatzis,et al.  Infinite Markov-Switching Maximum Entropy Discrimination Machines , 2013, ICML.

[5]  Jonathan P. How,et al.  Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture , 2013, NIPS.

[6]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[7]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[8]  Warren B. Powell,et al.  Dirichlet Process Mixtures of Generalized Linear Models , 2009, J. Mach. Learn. Res..

[9]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[10]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[11]  Bingbing Ni,et al.  RGBD-HuDaAct: A color-depth video database for human daily activity recognition , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[12]  François Laviolette,et al.  PAC-Bayesian learning of linear classifiers , 2009, ICML '09.

[13]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[14]  W. R. Schucany,et al.  Generating Random Variates Using Transformations with Multiple Roots , 1976 .

[15]  Ning Chen,et al.  Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines , 2011, ICML.

[16]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[17]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[18]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[19]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[20]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[21]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[22]  Jun Zhou,et al.  Mixing Linear SVMs for Nonlinear Classification , 2010, IEEE Transactions on Neural Networks.

[23]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[25]  John Shawe-Taylor,et al.  PAC-Bayes & Margins , 2002, NIPS.

[26]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[27]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[28]  Ning Chen,et al.  Gibbs Max-Margin Topic Models with Fast Sampling Algorithms , 2013, ICML.

[29]  Tommi S. Jaakkola,et al.  Maximum Entropy Discrimination , 1999, NIPS.

[30]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[31]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[32]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .