Hybrid Neural Models For Sequence Modelling: The Best Of Three Worlds

We propose a neural architecture with the main characteristics of the most successful neural models of the last years : bidirectional RNNs, encoder-decoder, and the Transformer model. Evaluation on three sequence labelling tasks yields results that are close to the state-of-the-art for all tasks and better than it for some of them, showing the pertinence of this hybrid architecture for this kind of tasks.

[1]  Isabelle Tellier,et al.  Improving Recurrent Neural Networks For Sequence Labelling , 2016, ArXiv.

[2]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[3]  Isabelle Tellier,et al.  New Recurrent Neural Network Variants for Sequence Labeling , 2016, CICLing.

[4]  Christian Raymond,et al.  Label-Dependency Coding in Simple Recurrent Networks for Spoken Language Understanding , 2017, INTERSPEECH.

[5]  Rongrong Ji,et al.  Asynchronous Bidirectional Decoding for Neural Machine Translation , 2018, AAAI.

[6]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[7]  Isabelle Tellier,et al.  Label-Dependencies Aware Recurrent Neural Networks , 2017, CICLing.

[8]  Nenghai Yu,et al.  Deliberation Networks: Sequence Generation Beyond One-Pass Decoding , 2017, NIPS.

[9]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[10]  Alessandro Moschitti,et al.  Discriminative Reranking for Spoken Language Understanding , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[12]  Sophie Rosset,et al.  Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding , 2011, EMNLP.

[13]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[14]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[15]  Marco Dinarelli,et al.  Spoken Language Understanding: from Spoken Utterances to Semantic Structures , 2010 .

[16]  Marco Dinarelli,et al.  Seq2Biseq: Bidirectional Output-wise Recurrent Neural Networks for Sequence Modelling , 2019, ArXiv.

[17]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[18]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[19]  Omer Levy,et al.  Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum , 2018, ACL.

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21]  Sophie Rosset,et al.  Tree Representations in Probabilistic Models for Extended Named Entities Detection , 2012, EACL.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[24]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[25]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[26]  Sophie Rosset,et al.  Tree-Structured Named Entity Recognition on OCR Data: Analysis, Processing and Results , 2012, LREC.

[27]  Frédéric Béchet,et al.  Results of the French Evalda-Media evaluation campaign for literal understanding , 2006, LREC.

[28]  Giuseppe Riccardi,et al.  What's in an ontology for spoken language understanding , 2009, INTERSPEECH.

[29]  Alessandro Moschitti,et al.  Concept segmentation and labeling for conversational speech , 2009, INTERSPEECH.

[30]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[31]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[32]  Renato De Mori,et al.  Spoken language understanding: a survey , 2007, ASRU.

[33]  Alexander M. Rush,et al.  Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints , 2012, EMNLP-CoNLL.

[34]  Alessandro Moschitti,et al.  Re-Ranking Models Based-on Small Training Data for Spoken Language Understanding , 2009, EMNLP.

[35]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[38]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[39]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[40]  Hermann Ney,et al.  Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  François Yvon,et al.  Learning the Structure of Variable-Order CRFs: a finite-state perspective , 2017, EMNLP.

[42]  Yuan Zhang,et al.  Learning Tag Dependencies for Sequence Tagging , 2018, IJCAI.

[43]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .