Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

One of the main obstacles to producing high quality joint models is the lack of jointly annotated data. Joint modeling of multiple natural language processing tasks outperforms single-task models learned from the same data, but still under-performs compared to single-task models learned on the more abundant quantities of available single-task annotated data. In this paper we present a novel model which makes use of additional single-task annotated data to improve the performance of a joint model. Our model utilizes a hierarchical prior to link the feature weights for shared features in several single-task models and the joint model. Experiments on joint parsing and named entity recognition, using the OntoNotes corpus, show that our hierarchical joint model can produce substantial gains over a joint model trained on only the jointly annotated data.

[1]  Richard Johansson,et al.  The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies , 2008, CoNLL.

[2]  Stephen Clark,et al.  Joint Word Segmentation and POS Tagging Using a Single Perceptron , 2008, ACL.

[3]  Christopher D. Manning,et al.  Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[4]  Daphne Koller,et al.  Convex Point Estimation using Undirected Bayesian Transfer Hierarchies , 2008, UAI.

[5]  Andrew McCallum,et al.  Joint Parsing and Semantic Role Labeling , 2005, CoNLL.

[6]  Christopher D. Manning,et al.  Nested Named Entity Recognition , 2009, EMNLP.

[7]  Christopher D. Manning,et al.  Joint Parsing and Named Entity Recognition , 2009, NAACL.

[8]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[9]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[10]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[11]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[12]  Kristina Toutanova,et al.  A global model for joint lemmatization and part-of-speech prediction , 2009, ACL.

[13]  Tong Zhang,et al.  A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[14]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[15]  Galen Andrew,et al.  A Hybrid Markov/Semi-Markov Conditional Random Field for Sequence Segmentation , 2006, EMNLP.

[16]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[17]  Scott Miller,et al.  A Novel Use of Statistical Parsing to Extract Information from Text , 2000, ANLP.

[18]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[19]  Michael Elhadad,et al.  An Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation , 2006, ACL.

[20]  Reut Tsarfaty,et al.  A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing , 2008, ACL.

[21]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[22]  Richard Johansson,et al.  Dependency-based Syntactic–Semantic Analysis with PropBank and NomBank , 2008, CoNLL.

[23]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[24]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[25]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[26]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.