Principled Generative-Discriminative Hybrid Hidden Markov Model

Extended abstract In this work, we propose a new probabilistic model which generalizes discriminative properties of a Conditional Random Field, and generative properties of a Hidden Markov Model for labeling and segmenting sequence data. We also present promising preliminary results of application of our model to several natural language processing tasks. Modeling and learning the probability distribution has a variety of applications in machine learning and classi cation in particular. The parameters for probabilistic models can be learned in order to maximize the objective function of interest. When the goal is classi cation, for instance, one is interested in maximizing the probability of the correct label of the input given the input (discriminative models). On the other hand, one may also be interested in learning the entire probability distribution of the input and the output (generative models). For the past several years there has been a great deal of research about which objective function needs to be maximized, and what trade-o s exist between the two training regimes. Several algorithms that combine the strengths of generative and discriminative models were also proposed. Recently, Minka [3] suggested that the trade-o between generative and discriminative models are the choice of the priors for the model. This has been taken further, and Lasserre et al [2] suggested that there is only one way to train a probabilistic model in order to combine generative and discriminative properties of the model. In the context of structured prediction, Hidden Markov Model (HMM) has been widely used in many applications, such as natural language processing, protein sturcture prediction, phoneme classi cation to name a few. HMM is a generative model. Its discriminative equivalent is Conditional Random Field (CRF) [1]. One of the limitations of a CRF is that it is not easy to incorporate the unlabeled data, and a lot of training data is needed for the CRF to achieve good performance accuracy. We adapt the hybrid generative-discriminative framework to combine generative and discriminative tradeo s for Hidden Markov models and the CRF's and propose a new model which is a generalization of CRF and HMM. Let X be the input alphabet and Y be the alphabet; Xi = {x1...xl}, xl ∈ X be the input sequence and Yi = {y1...yl}, yl ∈ Y be the output sequence. Given the labeled dataset D = {Xi, Yi}i=1 and unlabeled dataset UD = {Xi}i=1 the goal is to nd a function f : 2X → 2Y . One can nd such function by assuming a model with parameters θ, estimating the probability P (X,Y |θ), and use maximum a posteriory principle to classify a new instance. As in [2] θ = {θG, θD} are assumed of two types: generative θG, and discriminative θD. The joint distribution is given by:

[1]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[2]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[3]  T. Minka Discriminative models, not discriminative training , 2005 .

[4]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).