论文信息 - A unified architecture for natural language processing: deep neural networks with multitask learning

A unified architecture for natural language processing: deep neural networks with multitask learning

We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in state-of-the-art-performance.

Jason Weston | Ronan Collobert | J. Weston | Ronan Collobert | R. Collobert

[1] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[2] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[4] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[5] Thorsten Joachims,et al. Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[6] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7] Scott Miller,et al. A Novel Use of Statistical Parsing to Extract Information from Text , 2000, ANLP.

[8] Vincenzo Pallotta,et al. Robust methods in analysis of natural language data , 2002, Natural Language Engineering.

[9] Daniel Gildea,et al. The Necessity of Parsing for Predicate Argument Recognition , 2002, ACL.

[10] Jean-Luc Gauvain,et al. Connectionist language modeling for large vocabulary continuous speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Andrew McCallum,et al. Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[12] Daniel Jurafsky,et al. Shallow Semantic Parsing using Support Vector Machines , 2004, NAACL.

[13] Andrew McCallum,et al. Joint Parsing and Semantic Role Labeling , 2005, CoNLL.

[14] Tong Zhang,et al. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[15] Daniel Gildea,et al. The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[16] Andrew McCallum,et al. Composition of Conditional Random Fields for Transfer Learning , 2005, HLT.

[17] Bernhard Schölkopf,et al. Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[18] Eugene Charniak,et al. Effective Self-Training for Parsing , 2006, NAACL.

[19] Gabriele Musillo,et al. Robust Parsing of the Proposition Bank , 2006, Workshop On ROMAND Robust Methods In Analysis Of Natural Language Data.

[20] Gholamreza Haffari,et al. Transductive learning for statistical machine translation , 2007, ACL.

[21] Martine De Cock,et al. Fast Semantic Extraction Using a Novel Neural Network Architecture , 2007, ACL.

[22] Jun'ichi Tsujii,et al. A discriminative language model with pseudo-negative samples , 2007, ACL.

[23] Ronen Feldman,et al. Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web , 2007, ACL.

[24] McCallumAndrew,et al. Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data , 2007 .