Multi-Task Structured Prediction for Entity Analysis: Search-Based Learning Algorithms

Entity analysis in natural language processing involves solving multiple structured prediction problems such as mention detection, coreference resolution, and entity linking. We explore the space of search-based learning approaches to solve the problem of multi-task structured prediction (MTSP) in the context of entity analysis. In this paper, we study three different search architectures to solve MTSP problems that make different tradeoffs between speed and accuracy of training and inference. In all three architectures, we learn one or more scoring functions that employ both intra-task and inter-task features. In the “pipeline” architecture, which is the fastest, we solve different tasks one after another in a pipelined fashion. In the “joint” architecture, which is the most expensive, we formulate MTSP as a single-task structured prediction, and search the joint space of multi-task structured outputs. To improve the speed of joint architecture, we introduce two different pruning methods and associated learning techniques. In the intermediate “cyclic” architecture, we cycle through the tasks multiple times in sequence until there is no performance improvement. Results on two benchmark domains show that the joint architecture improves over the pipeline approach as well as the previous state-of-the-art approach based on graphical models. The cyclic architecture is faster than the joint approach and achieves competitive performance.

[1]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[2]  Heng Ji,et al.  Overview of TAC-KBP2015 Tri-lingual Entity Discovery and Linking , 2015, TAC.

[3]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[4]  Pascal Denis,et al.  Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming , 2007, NAACL.

[5]  William W. Cohen,et al.  Stacked Sequential Learning , 2005, IJCAI.

[6]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[7]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[8]  Heng Ji,et al.  Incremental Joint Extraction of Entity Mentions and Relations , 2014, ACL.

[9]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[10]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[11]  Emanuele Pianta,et al.  Extending English ACE 2005 Corpus Annotation with Ground-truth Links to Wikipedia , 2010, PWNLP@COLING.

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Heeyoung Lee,et al.  Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[14]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[15]  Zhen Wang,et al.  Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.

[16]  Dan Klein,et al.  An Empirical Analysis of Optimization for Max-Margin NLP , 2015, EMNLP.

[17]  Daniel Marcu,et al.  A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model , 2005, HLT.

[18]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[19]  Ming-Wei Chang,et al.  IllinoisSL: A JAVA Library for Structured Prediction , 2015, ArXiv.

[20]  Andrew McCallum,et al.  Joint inference of entities, relations, and coreference , 2013, AKBC '13.

[21]  Joakim Nivre,et al.  A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[22]  Alan Fern,et al.  HC-Search: A Learning Framework for Search-based Structured Prediction , 2014, J. Artif. Intell. Res..

[23]  Luke S. Zettlemoyer,et al.  Joint Coreference Resolution and Named-Entity Linking with Multi-Pass Sieves , 2013, EMNLP.

[24]  Jun'ichi Tsujii,et al.  Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese , 2012, ACL.

[25]  Xiaoqiang Luo,et al.  On Coreference Resolution Performance Metrics , 2005, HLT.

[26]  Thomas G. Dietterich,et al.  ℋC-search for structured prediction in computer vision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[28]  Alan Fern,et al.  Learning Linear Ranking Functions for Beam Search with Application to Planning , 2009, J. Mach. Learn. Res..

[29]  Dan Roth,et al.  Relational Inference for Wikification , 2013, EMNLP.

[30]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[31]  Alan Fern,et al.  Structured prediction via output space search , 2014, J. Mach. Learn. Res..

[32]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[33]  Christopher D. Manning,et al.  Joint Parsing and Named Entity Recognition , 2009, NAACL.

[34]  Daniel Marcu,et al.  Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[35]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.