Graphical Models for Primarily Unsupervised Sequence Labeling

Most models used in natural language processing must be trained on large corpora of labeled text. This tutorial explores a "primarily unsupervised" approach (based on graphical models) that augments a corpus of unlabeled text with some form of prior domain knowledge, but does not require any fully labeled examples. We survey probabilistic graphical models for (supervised) classification and sequence labeling and then present the prototype-driven approach of Haghighi and Klein (2006) to sequence labeling in detail, including a discussion of the theory and implementation of both conditional random fields and prototype learning. We show experimental results for English part of speech tagging. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-07-18. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/638 Graphical Models for Primarily Unsupervised Sequence Labeling∗ Neal Parikh neal@nparikh.org Mark Dredze mdredze@seas.upenn.edu Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104

[1]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Thomas G. Dietterich,et al.  Training conditional random fields via gradient tree boosting , 2004, ICML.

[4]  Fabrizio Sebastiani,et al.  A Tutorial on Automated Text Categorisation , 2000 .

[5]  L. Williams,et al.  Contents , 2020, Ophthalmology (Rochester, Minn.).

[6]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[7]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[8]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[9]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[10]  Dale Schuurmans,et al.  Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling , 2006, ACL.

[11]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[12]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Hinrich Schütze,et al.  Distributional Part-of-Speech Tagging , 1995, EACL.

[15]  Andrew McCallum,et al.  Gene Prediction with Conditional Random Fields , 2005 .

[16]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..

[19]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[20]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[21]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[22]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[23]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[24]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[25]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[26]  P. Gehler,et al.  An introduction to graphical models , 2001 .

[27]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[28]  Michael I. Jordan Graphical Models , 2003 .

[29]  James E. Galagan,et al.  Comparative Gene Prediction using Conditional Random Fields , 2006, NIPS.

[30]  Hanna M. Wallach,et al.  Efficient Training of Conditional Random Fields , 2002 .

[31]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[32]  Yuan Qi,et al.  Bayesian Conditional Random Fields , 2005, AISTATS.

[33]  William W. Cohen,et al.  Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text , 2005, HLT.

[34]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[35]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[36]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[37]  Hanna M. Wallach,et al.  Conditional Random Fields: An Introduction , 2004 .

[38]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[39]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[40]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[41]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[42]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[43]  Jason Weston,et al.  Large-Scale Semi-Supervised Learning , 2007, NATO ASI Mining Massive Data Sets for Security.

[44]  Gideon S. Mann,et al.  Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.

[45]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[46]  Tom. Mitchell GENERATIVE AND DISCRIMINATIVE CLASSIFIERS: NAIVE BAYES AND LOGISTIC REGRESSION Machine Learning , 2005 .

[47]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .