Conditional Random Field Autoencoders for Unsupervised Structured Prediction

We introduce a framework for unsupervised learning of structured predictors with overlapping, global features. Each input's latent representation is predicted conditional on the observed data using a feature-rich conditional random field (CRF). Then a reconstruction of the input is (re)generated, conditional on the latent structure, using a generative model which factorizes similarly to the CRF. The autoencoder formulation enables efficient exact inference without resorting to unrealistic independence assumptions or restricting the kinds of features that can be used. We illustrate connections to traditional autoencoders, posterior regularization, and multi-view learning. We then show competitive results with instantiations of the framework for two canonical tasks in natural language processing: part-of-speech induction and bitext word alignment, and show that training the proposed model can be substantially more efficient than a comparable feature-rich baseline.

[1]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[2]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[3]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[4]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Andrew McCallum,et al.  Alternating Projections for Learning with Expectation Constraints , 2009, UAI.

[8]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[9]  Vladimir Eidelman,et al.  cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[10]  Alon Lavie,et al.  Unsupervised Word Alignment with Arbitrary Features , 2011, ACL.

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[13]  Hal Daumé,et al.  Unsupervised search-based structured prediction , 2009, ICML '09.

[14]  T. Minka Discriminative models, not discriminative training , 2005 .

[15]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[16]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[17]  Andrew McCallum,et al.  High-Performance Semi-Supervised Learning using Discriminatively Constrained Generative Models , 2010, ICML.

[18]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[19]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[20]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[21]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[22]  Daphne Koller,et al.  Non-Local Contrastive Objectives , 2010, ICML.

[23]  Phil Blunsom,et al.  Discriminative Word Alignment with Conditional Random Fields , 2006, ACL.

[24]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[25]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[26]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[27]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[28]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[29]  Pedro M. Domingos,et al.  Joint Unsupervised Coreference Resolution with Markov Logic , 2008, EMNLP.

[30]  Sravana Reddy,et al.  Substring-based Transliteration with Conditional Random Fields , 2009, NEWS@IJCNLP.

[31]  C. Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Machine Learning.

[32]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[33]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[34]  Jun Suzuki,et al.  Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[35]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[36]  Chris Dyer,et al.  The CMU Submission for the Shared Task on Language Identification in Code-Switched Data , 2014, CodeSwitch@EMNLP.

[37]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[38]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[39]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[40]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[41]  Hermann Ney,et al.  HMM-Based Word Alignment in Statistical Translation , 1996, COLING.

[42]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[43]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[44]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[45]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[46]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.