论文信息 - Hybrid Neural Models For Sequence Modelling: The Best Of Three Worlds - 字舞流文

Hybrid Neural Models For Sequence Modelling: The Best Of Three Worlds

We propose a neural architecture with the main characteristics of the most successful neural models of the last years : bidirectional RNNs, encoder-decoder, and the Transformer model. Evaluation on three sequence labelling tasks yields results that are close to the state-of-the-art for all tasks and better than it for some of them, showing the pertinence of this hybrid architecture for this kind of tasks.

Marco Dinarelli | Loic Grobol

[1] Isabelle Tellier,et al. Improving Recurrent Neural Networks For Sequence Labelling , 2016, ArXiv.

[2] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[3] Isabelle Tellier,et al. New Recurrent Neural Network Variants for Sequence Labeling , 2016, CICLing.

[4] Christian Raymond,et al. Label-Dependency Coding in Simple Recurrent Networks for Spoken Language Understanding , 2017, INTERSPEECH.

[5] Rongrong Ji,et al. Asynchronous Bidirectional Decoding for Neural Machine Translation , 2018, AAAI.

[6] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[7] Isabelle Tellier,et al. Label-Dependencies Aware Recurrent Neural Networks , 2017, CICLing.

[8] Nenghai Yu,et al. Deliberation Networks: Sequence Generation Beyond One-Pass Decoding , 2017, NIPS.

[9] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10] Alessandro Moschitti,et al. Discriminative Reranking for Spoken Language Understanding , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12] Sophie Rosset,et al. Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding , 2011, EMNLP.

[13] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[14] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.

[15] Marco Dinarelli,et al. Spoken Language Understanding: from Spoken Utterances to Semantic Structures , 2010 .

[16] Marco Dinarelli,et al. Seq2Biseq: Bidirectional Output-wise Recurrent Neural Networks for Sequence Modelling , 2019, ArXiv.

[17] Claire Gardent,et al. Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[18] Mitchell P. Marcus,et al. Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[19] Omer Levy,et al. Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum , 2018, ACL.

[20] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21] Sophie Rosset,et al. Tree Representations in Probabilistic Models for Extended Named Entities Detection , 2012, EACL.

[22] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[23] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[24] Hwee Tou Ng,et al. A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[25] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[26] Sophie Rosset,et al. Tree-Structured Named Entity Recognition on OCR Data: Analysis, Processing and Results , 2012, LREC.

[27] Frédéric Béchet,et al. Results of the French Evalda-Media evaluation campaign for literal understanding , 2006, LREC.

[28] Giuseppe Riccardi,et al. What's in an ontology for spoken language understanding , 2009, INTERSPEECH.

[29] Alessandro Moschitti,et al. Concept segmentation and labeling for conversational speech , 2009, INTERSPEECH.

[30] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[31] Michael Collins,et al. Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[32] Renato De Mori,et al. Spoken language understanding: a survey , 2007, ASRU.

[33] Alexander M. Rush,et al. Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints , 2012, EMNLP-CoNLL.

[34] Alessandro Moschitti,et al. Re-Ranking Models Based-on Small Training Data for Spoken Language Understanding , 2009, EMNLP.

[35] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.

[36] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37] Eduard H. Hovy,et al. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[38] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[39] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[40] Hermann Ney,et al. Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[41] François Yvon,et al. Learning the Structure of Variable-Order CRFs: a finite-state perspective , 2017, EMNLP.

[42] Yuan Zhang,et al. Learning Tag Dependencies for Sequence Tagging , 2018, IJCAI.

[43] Wolfgang Lezius,et al. TIGER: Linguistic Interpretation of a German Corpus , 2004 .