Multi-task learning for sequential data via iHMMs and the nested Dirichlet process

A new hierarchical nonparametric Bayesian model is proposed for the problem of multitask learning (MTL) with sequential data. Sequential data are typically modeled with a hidden Markov model (HMM), for which one often must choose an appropriate model structure (number of states) before learning. Here we model sequential data from each task with an infinite hidden Markov model (iHMM), avoiding the problem of model selection. The MTL for iHMMs is implemented by imposing a nested Dirichlet process (nDP) prior on the base distributions of the iHMMs. The nDP-iHMM MTL method allows us to perform task-level clustering and data-level clustering simultaneously, with which the learning for individual iHMMs is enhanced and between-task similarities are learned. Learning and inference for the nDP-iHMM MTL are based on a Gibbs sampler. The effectiveness of the framework is demonstrated using synthetic data as well as real music data.

[1]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[2]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[3]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[4]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[5]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[6]  A. Raftery,et al.  How Many Iterations in the Gibbs Sampler , 1991 .

[7]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[8]  Anne Lohrli Chapman and Hall , 1985 .

[9]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[10]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[11]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[12]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[13]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[14]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[15]  A. Gelfand,et al.  The Nested Dirichlet Process , 2008 .

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  François Pachet,et al.  Music Similarity Measures: What's the use? , 2002, ISMIR.

[18]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[19]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[20]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[21]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[22]  Lawrence Carin,et al.  Hidden Markov models for multiaspect target classification , 1999, IEEE Trans. Signal Process..