论文信息 - A Survey of the Usages of Deep Learning for Natural Language Processing

A Survey of the Usages of Deep Learning for Natural Language Processing

Over the last several years, the field of natural language processing has been propelled forward by an explosion in the use of deep learning models. This article provides a brief introduction to the field and a quick overview of deep learning architectures and methods. It then sifts through the plethora of recent studies and summarizes a large assortment of relevant contributions. Analyzed research areas include several core linguistic processing issues in addition to many applications of computational linguistics. A discussion of the current state of the art is then provided along with recommendations for future research in the field.

[1] Fei Sha,et al. Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[3] Yi Yang,et al. Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Kôiti Hasida,et al. Construction of a Japanese Relevance-tagged Corpus , 2002, LREC.

[5] Jeffrey Pennington,et al. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[6] Zhendong Mao,et al. Knowledge Graph Embedding: A Survey of Approaches and Applications , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7] Anirban Laha,et al. Story Generation from Sequence of Independent Short Descriptions , 2017, ArXiv.

[8] Jugal Kalita,et al. Introducing Aspects of Creativity in Automatic Poetry Generation , 2020, ICON.

[9] Rico Sennrich,et al. Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[10] Katharina Kann,et al. Lost in Translation: Analysis of Information Loss During Machine Translation Between Polysynthetic and Fusional Languages , 2018, ArXiv.

[11] Marc'Aurelio Ranzato,et al. Analyzing Uncertainty in Neural Machine Translation , 2018, ICML.

[12] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[13] Josef van Genabith,et al. QuestionBank: Creating a Corpus of Parse-Annotated Questions , 2006, ACL.

[14] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[15] Guodong Zhou,et al. Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches , 2017, COLING.

[16] Andrew Y. Ng,et al. Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[17] Christoph Goller,et al. Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[18] Kevin Duh,et al. Inference is Everything: Recasting Semantic Resources into a Unified Evaluation Framework , 2017, IJCNLP.

[19] Bhaskar Mitra,et al. Neural Networks for Information Retrieval , 2017, SIGIR.

[20] Samuel R. Bowman,et al. The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference with Sentence Representations , 2017, RepEval@EMNLP.

[21] Pontus Stenetorp,et al. Transition-based Dependency Parsing Using Recursive Neural Networks , 2013 .

[22] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[23] Dragomir R. Radev,et al. Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[24] Hang Li,et al. A Deep Architecture for Matching Short Texts , 2013, NIPS.

[25] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.

[26] Dragomir R. Radev,et al. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[27] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[28] Slav Petrov,et al. Overview of the 2012 Shared Task on Parsing the Web , 2012 .

[29] R. Weisberg. A-N-D , 2011 .

[30] Jürgen Schmidhuber,et al. Highway Networks , 2015, ArXiv.

[31] Ming Zhou,et al. Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.

[32] Slav Petrov,et al. Structured Training for Neural Network Transition-Based Parsing , 2015, ACL.

[33] Percy Liang,et al. Unifying Human and Statistical Evaluation for Natural Language Generation , 2019, NAACL.

[34] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Alex Krizhevsky,et al. One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[36] Karen Sparck Jones. Natural Language Processing: A Historical Review , 1994 .

[37] Cícero Nogueira dos Santos,et al. Boosting Named Entity Recognition with Neural Character Embeddings , 2015, NEWS@ACL.

[38] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[39] Jens Lehmann,et al. DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[40] Ji Wang,et al. Pretraining-Based Natural Language Generation for Text Summarization , 2019, CoNLL.

[41] Mathieu Cliche,et al. BB_twtr at SemEval-2017 Task 4: Twitter Sentiment Analysis with CNNs and LSTMs , 2017, *SEMEVAL.

[42] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.

[43] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.

[44] Jonathan Berant,et al. Neural Semantic Parsing over Multiple Knowledge-bases , 2017, ACL.

[45] Timothy Dozat,et al. Simpler but More Accurate Semantic Dependency Parsing , 2018, ACL.

[46] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[47] Joseph Weizenbaum,et al. ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[48] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[49] Wei Xu,et al. Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Geoffrey Zweig,et al. Language Models for Image Captioning: The Quirks and What Works , 2015, ACL.

[51] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[52] Yoshua Bengio,et al. A Character-level Decoder without Explicit Segmentation for Neural Machine Translation , 2016, ACL.

[53] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[54] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[55] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[56] Wanxiang Che,et al. A Neural Transition-Based Approach for Semantic Dependency Graph Parsing , 2018, AAAI.

[57] Eneko Agirre,et al. SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[58] Yann LeCun,et al. Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[59] Guoyin Wang,et al. Topic-Guided Variational Auto-Encoder for Text Generation , 2019, NAACL.

[60] Zhi Chen,et al. Adversarial Feature Matching for Text Generation , 2017, ICML.

[61] Daisuke Kawahara,et al. Case Frame Compilation from the Web using High-Performance Computing , 2006, LREC.

[62] Christopher Meek,et al. Semantic Parsing for Single-Relation Question Answering , 2014, ACL.

[63] James Hammerton,et al. Named Entity Recognition with Long Short-Term Memory , 2003, CoNLL.

[64] Alexander M. Rush,et al. OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[65] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[66] Wanxiang Che,et al. SemEval-2016 Task 9: Chinese Semantic Dependency Parsing , 2016, International Workshop on Semantic Evaluation.

[67] Meng Zhang,et al. Neural Network Methods for Natural Language Processing , 2017, Computational Linguistics.

[68] Silvia Bernardini,et al. The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[69] Yi Yang,et al. WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[70] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[71] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[72] F. Attneave,et al. The Organization of Behavior: A Neuropsychological Theory , 1949 .

[73] Pietro Laface,et al. recognition and understanding , 1988 .

[74] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[75] Maria T. Pazienza,et al. Information Extraction , 2002, Lecture Notes in Computer Science.

[76] Yoshua Bengio,et al. Convolutional networks for images, speech, and time series , 1998 .

[77] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[78] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[79] Barbara J. Grosz,et al. Natural-Language Processing , 1982, Artificial Intelligence.

[80] Slav Petrov,et al. Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[81] Joakim Nivre,et al. Towards a Universal Grammar for Natural Language Processing , 2015, CICLing.

[82] Phil Blunsom,et al. Compositional Morphology for Word Representations and Language Modelling , 2014, ICML.

[83] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[84] W. Bruce Croft,et al. A Deep Relevance Matching Model for Ad-hoc Retrieval , 2016, CIKM.

[85] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[86] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[87] Xiaodong Liu,et al. Stochastic Answer Networks for Machine Reading Comprehension , 2017, ACL.

[88] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[89] 武田一哉,et al. Recurrent Neural Networkに基づく日常生活行動認識 , 2016 .

[90] Gemma Boleda,et al. Convolutional Neural Network Language Models , 2016, EMNLP.

[91] E. Colleoni,et al. Measuring Organizational Legitimacy in Social Media: Assessing Citizens’ Judgments With Sentiment Analysis , 2018 .

[92] Jason Weston,et al. A semantic matching energy function for learning with multi-relational data , 2013, Machine Learning.

[93] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[94] Eric P. Xing,et al. Toward Controlled Generation of Text , 2017, ICML.

[95] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[96] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.

[97] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.

[98] Rajat Raina,et al. Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[99] Mauro Cettolo,et al. WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[100] Zhe Gan,et al. Adversarial Text Generation via Feature-Mover's Distance , 2018, NeurIPS.

[101] Kunihiko Fukushima,et al. Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position , 1982, Pattern Recognit..

[102] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[103] Bo Xu,et al. Joint entity and relation extraction based on a hybrid neural network , 2017, Neurocomputing.

[104] Philip J. Hayes,et al. Automatic Extraction of Facts from Press Releases to Generate News Stories , 1992, ANLP.

[105] M. A. R T A P A L,et al. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[106] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[107] Christopher Joseph Pal,et al. Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.

[108] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[109] Chris Callison-Burch,et al. Learning Antonyms with Paraphrases and a Morphology-Aware Neural Network , 2017, *SEM.

[110] Yusuke Miyao,et al. SemEval 2015 Task 18: Broad-Coverage Semantic Dependency Parsing , 2015, *SEMEVAL.

[111] Dan Roth,et al. Learning Question Classifiers , 2002, COLING.

[112] Tim Rocktäschel,et al. Frustratingly Short Attention Spans in Neural Language Modeling , 2017, ICLR.

[113] Ting Yao,et al. Deep Learning for Video Classification and Captioning , 2016, Frontiers of Multimedia Research.

[114] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.

[115] Yonatan Belinkov,et al. What do Neural Machine Translation Models Learn about Morphology? , 2017, ACL.

[116] Omer Levy,et al. Mask-Predict: Parallel Decoding of Conditional Masked Language Models , 2019, EMNLP.

[117] Daniel Jurafsky,et al. A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[118] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[119] Hermann Ney,et al. A Deep Learning Approach to Machine Transliteration , 2009, WMT@EACL.

[120] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[121] Yonatan Belinkov,et al. On the Evaluation of Semantic Phenomena in Neural Machine Translation Using Natural Language Inference , 2018, NAACL.

[122] Zhiyi Chi,et al. Estimation of Probabilistic Context-Free Grammars , 1998, Comput. Linguistics.

[123] Rico Sennrich,et al. Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[124] Wenpeng Yin,et al. Convolutional Neural Network for Paraphrase Identification , 2015, NAACL.

[125] Xiaojun Wan,et al. A Neural Approach to Pun Generation , 2018, ACL.

[126] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[127] D. Hubel,et al. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[128] Mitchell P. Marcus,et al. OntoNotes: The 90% Solution , 2006, NAACL.

[129] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[130] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[131] Daisuke Kawahara,et al. Morphological Analysis for Unsegmented Languages using Recurrent Neural Network Language Model , 2015, EMNLP.

[132] Razvan Pascanu,et al. Discovering objects and their relations from entangled scene representations , 2017, ICLR.

[133] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[134] Marcello Federico,et al. Report on the 11th IWSLT evaluation campaign , 2014, IWSLT.

[135] Douwe Kiela,et al. Automatically Generating Rhythmic Verse with Neural Networks , 2017, ACL.

[136] Jun Zhao,et al. Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks , 2015, ACL.

[137] Quoc V. Le,et al. Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[138] John Cocke,et al. Probabilistic Parsing Method for Sentence Disambiguation , 1989, IWPT.

[139] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[140] Mark O. Riedl,et al. Controllable Neural Story Plot Generation via Reward Shaping , 2019, IJCAI.

[141] Yue Zhang,et al. A Neural Probabilistic Structured-Prediction Model for Transition-Based Dependency Parsing , 2015, ACL.

[142] Lucila Ohno-Machado,et al. Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[143] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[144] Joelle Pineau,et al. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[145] Yidong Chen,et al. Deep Semantic Role Labeling with Self-Attention , 2017, AAAI.

[146] Luca Maria Gambardella,et al. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[147] Lukás Burget,et al. Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[148] Paul Smolensky,et al. Information processing in dynamical systems: foundations of harmony theory , 1986 .

[149] Mathias Creutz,et al. Unsupervised models for morpheme segmentation and morphology learning , 2007, TSLP.

[150] Jiawei Han,et al. Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[151] Holger Schwenk,et al. Continuous Space Translation Models for Phrase-Based Statistical Machine Translation , 2012, COLING.

[152] Cícero Nogueira dos Santos,et al. Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[153] Heyan Huang,et al. Open Domain Event Extraction Using Neural Latent Variable Models , 2019, ACL.

[154] Alexandre Allauzen,et al. Non-lexical neural architecture for fine-grained POS Tagging , 2015, EMNLP.

[155] W. Bruce Croft,et al. From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing , 2018, CIKM.

[156] Jugal K. Kalita,et al. Parallel Attention Mechanisms in Neural Machine Translation , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[157] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[158] Zhe Gan,et al. Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation , 2018, AAAI.

[159] Richard Johansson,et al. The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[160] Walter Daelemans,et al. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, EMNLP 2014.

[161] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[162] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[163] Richard Johansson,et al. The CoNLL 2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies , 2008, CoNLL.

[164] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[165] Richard Socher,et al. Weighted Transformer Network for Machine Translation , 2017, ArXiv.

[166] Eugene Charniak,et al. Statistical language learning , 1997 .

[167] Jugal K. Kalita,et al. Genre Identification and the Compositional Effect of Genre in Literature , 2018, COLING.

[168] Ning Wang,et al. A survey on deep neural network-based image captioning , 2018, The Visual Computer.

[169] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.

[170] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[171] Stanley F. Chen,et al. Evaluation Metrics For Language Models , 1998 .

[172] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[173] Md. Zakir Hossain,et al. A Comprehensive Survey of Deep Learning for Image Captioning , 2018, ACM Comput. Surv..

[174] Min Zhang,et al. Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing , 2014, ACL.

[175] W. Pitts,et al. A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[176] Ngoc Thang Vu,et al. Character Composition Model with Convolutional Neural Networks for Dependency Parsing on Morphologically Rich Languages , 2017, ACL.

[177] Francine Chen,et al. A trainable document summarizer , 1995, SIGIR '95.

[178] Don Monroe,et al. Neuromorphic computing gets ready for the (really) big time , 2014, CACM.

[179] Laurene V. Fausett,et al. Fundamentals Of Neural Networks , 1994 .

[180] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[181] Yelong Shen,et al. Learning semantic representations using convolutional neural networks for web search , 2014, WWW.

[182] Bhargav Chippada,et al. Knowledge Amalgam: Generating Jokes and Quotes Together , 2018, ArXiv.

[183] Ming Zhou,et al. Question Answering over Freebase with Multi-Column Convolutional Neural Networks , 2015, ACL.

[184] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[185] Nazli Goharian,et al. CEDR: Contextualized Embeddings for Document Ranking , 2019, SIGIR.

[186] Changshui Zhang,et al. Aligning where to see and what to tell: image caption with region-based attention and scene factorization , 2015, ArXiv.

[187] Daniel Marcu,et al. Statistical Phrase-Based Translation , 2003, NAACL.

[188] Alexander Dekhtyar,et al. Information Retrieval , 2018, Lecture Notes in Computer Science.

[189] Noah A. Smith,et al. Recurrent Neural Network Grammars , 2016, NAACL.

[190] Daisuke Kawahara,et al. Building a Diverse Document Leads Corpus Annotated with Semantic Relations , 2012, PACLIC.

[191] Phil Blunsom,et al. Recurrent Continuous Translation Models , 2013, EMNLP.

[192] Jimmy J. Lin,et al. Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement , 2016, NAACL.

[193] Dominique Estival,et al. Active learning for deep semantic parsing , 2018, ACL.

[194] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[195] Daniel G. Bobrow,et al. Natural Language Input for a Computer Problem Solving System , 1964 .

[196] Jonathan Berant,et al. Building a Semantic Parser Overnight , 2015, ACL.

[197] Joakim Nivre,et al. An Improved Oracle for Dependency Parsing with Online Reordering , 2009, IWPT.

[198] Guillaume Lample,et al. Neural Architectures for Named Entity Recognition , 2016, NAACL.

[199] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[200] R. Fletcher. Practical Methods of Optimization , 1988 .

[201] Yijun Wang,et al. AntNLP at CoNLL 2018 Shared Task: A Graph-Based Parser for Universal Dependency Parsing , 2018, CoNLL Shared Task.

[202] Ari Rappoport,et al. Universal Dependency Parsing with a General Transition-Based DAG Parser , 2018, CoNLL.

[203] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[204] Taro Watanabe,et al. Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection , 2018, WMT.

[205] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[206] Yong Yu,et al. Long Text Generation via Adversarial Training with Leaked Information , 2017, AAAI.

[207] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[208] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[209] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[210] Tony Robinson,et al. Scaling recurrent neural network language models , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[211] Mauro Cettolo. An Arabic-Hebrew parallel corpus of TED talks , 2016, ArXiv.

[212] Walter Daelemans,et al. TiMBL: Tilburg Memory-Based Learner , 2007 .

[213] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[214] Cordell Green. Theorem-Proving by Resolution as a Basis for Question-Answering Systems , 2010 .

[215] Wanxiang Che,et al. SemEval-2012 Task 5: Chinese Semantic Dependency Parsing , 2012, *SEMEVAL.

[216] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.

[217] Phong Le,et al. The Inside-Outside Recursive Neural Network model for Dependency Parsing , 2014, EMNLP.

[218] Emiel Krahmer,et al. Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[219] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[220] Hongwei Wang,et al. Sentimental feature selection for sentiment analysis of Chinese online reviews , 2015, International Journal of Machine Learning and Cybernetics.

[221] Tara N. Sainath,et al. The shared views of four research groups ) , 2012 .

[222] Noah A. Smith,et al. Toward Abstractive Summarization Using Semantic Representations , 2018, NAACL.

[223] Pascal Denis,et al. A Framework for Understanding the Role of Morphology in Universal Dependency Parsing , 2018, EMNLP.

[224] Saurabh Gupta,et al. Exploring Nearest Neighbor Approaches for Image Captioning , 2015, ArXiv.

[225] Yu Xue,et al. Text classification based on deep belief network and softmax regression , 2016, Neural Computing and Applications.

[226] W. J. Hutchins. Machine translation over fifty years , 2001 .

[227] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[228] Marcus Rohrbach,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.

[229] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[230] Roger Wattenhofer,et al. Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden Representations , 2018, NIPS 2018.

[231] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[232] Stephan Oepen,et al. Broad-Coverage Semantic Dependency Parsing , 2014 .

[233] Eugene Charniak,et al. Parsing as Language Modeling , 2016, EMNLP.

[234] Quoc V. Le,et al. Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[235] Zellig S. Harris,et al. Distributional Structure , 1954 .

[236] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.

[237] Francis Ferraro,et al. Semantic Proto-Roles , 2015, TACL.

[238] Kevin Knight,et al. A Syntax-based Statistical Translation Model , 2001, ACL.

[239] Eric Nichols,et al. Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[240] Terry Winograd,et al. Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[241] Philip Clarkson,et al. Improved language modelling through better language model evaluation measures , 2001, Comput. Speech Lang..

[242] Jayant Krishnamurthy,et al. Neural Semantic Parsing with Type Constraints for Semi-Structured Tables , 2017, EMNLP.

[243] Ralph Grishman,et al. Joint Event Extraction via Recurrent Neural Networks , 2016, NAACL.

[244] Mari Ostendorf,et al. Analyzing and predicting language model improvements , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[245] Christopher D. Manning,et al. The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[246] Yang Liu,et al. Modeling Coverage for Neural Machine Translation , 2016, ACL.

[247] Nuno Seco,et al. HAREM: An Advanced NER Evaluation Contest for Portuguese , 2006, LREC.

[248] Dan Klein,et al. Improving Neural Parsing by Disentangling Model Combination and Reranking Effects , 2017, ACL.

[249] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[250] Jugal K. Kalita,et al. Abstractive Summarization Using Attentive Neural Techniques , 2018, ArXiv.

[251] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[252] Christopher D. Manning,et al. Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[253] Vaibhava Goel,et al. Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[254] He Ren,et al. Neural Joke Generation , 2017 .

[255] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[256] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[257] Joakim Nivre,et al. Incrementality in Deterministic Dependency Parsing , 2004 .

[258] H. P. Edmundson,et al. New Methods in Automatic Extracting , 1969, JACM.

[259] Noah A. Smith,et al. Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts , 2019, ACL.

[260] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[261] Chris Dyer,et al. Document Context Language Models , 2015, ICLR 2015.

[262] Kevin Lin,et al. Adversarial Ranking for Language Generation , 2017, NIPS.

[263] Joris Pelemans,et al. Sparse non-negative matrix language modeling for skip-grams , 2015, INTERSPEECH.

[264] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[265] A. V. Phillips,et al. A Question-Answering Routine , 1960 .

[266] Qiang Chen,et al. Network In Network , 2013, ICLR.

[267] Graham Neubig,et al. Stronger Baselines for Trustable Results in Neural Machine Translation , 2017, NMT@ACL.

[268] Jimmy J. Lin,et al. End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[269] Martin Potthast,et al. CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2018, CoNLL.

[270] Mark O. Riedl,et al. Controllable Neural Story Plot Generation via Reward Shaping , 2018, IJCAI.

[271] Miao Fan,et al. Logician: A Unified End-to-End Neural Approach for Open-Domain Information Extraction , 2018, WSDM.

[272] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.

[273] Marie-Francine Moens,et al. A survey on the application of recurrent neural networks to statistical language modeling , 2015, Comput. Speech Lang..

[274] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[275] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[276] James F. Allen. Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[277] Philipp Koehn,et al. Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[278] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.

[279] Jimmy J. Lin,et al. Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks , 2015, EMNLP.

[280] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[281] Carl Doersch,et al. Tutorial on Variational Autoencoders , 2016, ArXiv.

[282] Yici Cai,et al. Poet-based Poetry Generation: Controlling Personal Style with Recurrent Neural Networks , 2018, 2018 International Conference on Computing, Networking and Communications (ICNC).

[283] Lukás Burget,et al. Residual Memory Networks in Language Modeling: Improving the Reputation of Feed-Forward Networks , 2017, INTERSPEECH.

[284] Frederick Jelinek,et al. Basic Methods of Probabilistic Context Free Grammars , 1992 .

[285] Xu Sun,et al. A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation , 2018, EMNLP.

[286] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[287] Jugal K. Kalita,et al. Hierarchical Text Generation using an Outline , 2018, ArXiv.

[288] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[289] M. Kawato,et al. A hierarchical neural-network model for control and learning of voluntary movement , 2004, Biological Cybernetics.

[290] Noah A. Smith,et al. What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[291] Rita Cucchiara,et al. Hierarchical Boundary-Aware Neural Encoder for Video Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[292] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[293] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[294] Rachel Rudinger,et al. Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[295] Andrew Y. Ng,et al. Parsing with Compositional Vector Grammars , 2013, ACL.

[296] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[297] Jimmy J. Lin,et al. DocBERT: BERT for Document Classification , 2019, ArXiv.

[298] Catherine D. Schuman,et al. A Survey of Neuromorphic Computing and Neural Networks in Hardware , 2017, ArXiv.

[299] Simon King,et al. IEEE Workshop on automatic speech recognition and understanding , 2009 .

[300] Trevor Darrell,et al. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[301] Jugal K. Kalita,et al. Detecting and Extracting Events from Text Documents , 2016, ArXiv.

[302] Hang Li,et al. Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[303] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[304] Slav Petrov,et al. Improved Transition-Based Parsing and Tagging with Neural Networks , 2015, EMNLP.

[305] Nanyun Peng,et al. Towards Controllable Story Generation , 2018 .

[306] Jun Zhao,et al. Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[307] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[308] Chris Quirk,et al. Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[309] Richard Socher,et al. Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[310] Timothy Dozat,et al. Universal Dependency Parsing from Scratch , 2019, CoNLL.

[311] M. Marelli,et al. SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[312] James Cross,et al. Incremental Parsing with Minimal Features Using Bi-Directional LSTM , 2016, ACL.

[313] David Ahn,et al. The stages of event extraction , 2006 .

[314] Xiaodong Liu,et al. Stochastic Answer Networks for Natural Language Inference , 2018, ArXiv.

[315] Hwee Tou Ng,et al. Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[316] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[317] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[318] Omer Levy,et al. Constant-Time Machine Translation with Conditional Masked Language Models , 2019, IJCNLP 2019.

[319] J. J. Rocchio,et al. Relevance feedback in information retrieval , 1971 .

[320] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[321] Martin Wattenberg,et al. Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[322] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.

[323] Very Large Corpora. Empirical Methods in Natural Language Processing , 1999 .

[324] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[325] Nizar Habash,et al. CoNLL-UL: Universal Morphological Lattices for Universal Dependency Parsing , 2018, LREC.

[326] Noah A. Smith,et al. Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[327] Roland Vollgraf,et al. Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[328] Joakim Nivre,et al. An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[329] David A. Forsyth,et al. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[330] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[331] Yijia Liu,et al. Parsing Tweets into Universal Dependencies , 2018, NAACL.

[332] Xueqi Cheng,et al. Text Matching as Image Recognition , 2016, AAAI.