Learning Hidden Markov Models with Distributed State Representations for Domain Adaptation

Recently, a variety of representation learning approaches have been developed in the literature to induce latent generalizable features across two domains. In this paper, we extend the standard hidden Markov models (HMMs) to learn distributed state representations to improve cross-domain prediction performance. We reformulate the HMMs by mapping each discrete hidden state to a distributed representation vector and employ an expectationmaximization algorithm to jointly learn distributed state representations and model parameters. We empirically investigate the proposed model on cross-domain part-ofspeech tagging and noun-phrase chunking tasks. The experimental results demonstrate the effectiveness of the distributed HMMs on facilitating domain adaptation.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Alexander M. Rush,et al.  Spectral Learning of Refinement HMMs , 2013, CoNLL.

[3]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[4]  Herbert Jaeger,et al.  Observable Operator Models for Discrete Stochastic Time Series , 2000, Neural Computation.

[5]  Fei Huang,et al.  Exploring Representation-Learning Approaches to Domain Adaptation , 2010 .

[6]  Xian Wu,et al.  Domain Adaptation with Latent Semantic Association for Named Entity Recognition , 2009, NAACL.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[9]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[10]  John Blitzer,et al.  Frustratingly Hard Domain Adaptation for Dependency Parsing , 2007, EMNLP.

[11]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[12]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[13]  Christopher D. Manning,et al.  Learning Distributed Representations for Structured Output Prediction , 2014, NIPS.

[14]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[15]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[16]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[17]  Nancy Ide American National Corpus (ANC) , 2002 .

[18]  Hwee Tou Ng,et al.  Domain adaptation for semantic role labeling in the biomedical domain , 2010, Bioinform..

[19]  Alexander Yates,et al.  Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling , 2009, ACL.

[20]  Ivan A. Sag,et al.  Syntactic Theory: A Formal Introduction , 1999, Computational Linguistics.

[21]  Alexander Yates,et al.  Open-Domain Semantic Role Labeling by Modeling Word Spans , 2010, ACL.

[22]  Marie Candito,et al.  A Word Clustering Approach to Domain Adaptation: Effective Parsing of Biomedical Texts , 2011, IWPT.

[23]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[24]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[25]  Doug Downey,et al.  Language Models as Representations for Weakly Supervised NLP Tasks , 2011, CoNLL.

[26]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[27]  Dirk Hovy,et al.  Mining for unambiguous instances to adapt part-of-speech taggers to new domains , 2015, HLT-NAACL.