论文信息 - Bridging CNNs, RNNs, and Weighted Finite-State Machines - 字舞流文

Bridging CNNs, RNNs, and Weighted Finite-State Machines

Recurrent and convolutional neural networks comprise two distinct families of models that have proven to be useful for encoding natural language utterances. In this paper we present SoPa, a new model that aims to bridge these two approaches. SoPa combines neural representation learning with weighted finite-state automata (WFSAs) to learn a soft version of traditional surface patterns. We show that SoPa is an extension of a one-layer CNN, and that such CNNs are equivalent to a restricted version of SoPa, and accordingly, to a restricted form of WFSA. Empirically, on three text classification tasks, SoPa is comparable or better than both a BiLSTM (RNN) baseline and a CNN baseline, and is particularly useful in small data settings.

Roy Schwartz | Noah A. Smith | Sam Thomson | Roy Schwartz | Sam Thomson

[1] Wenpeng Yin,et al. Multichannel Variable-Size Convolution for Sentence Classification , 2015, CoNLL.

[2] Ari Rappoport,et al. ICWSM - A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews , 2010, ICWSM.

[3] James L. McClelland,et al. Finite State Automata and Simple Recurrent Networks , 1989, Neural Computation.

[4] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[5] Marcel Paul Schützenberger,et al. On the Definition of a Family of Automata , 1961, Inf. Control..

[6] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[7] Nathanael Chambers,et al. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[8] Mehryar Mohri,et al. Finite-State Transducers in Language and Speech Processing , 1997, CL.

[9] Markus Dreyer,et al. A non-parametric model for the discovery of inflectional paradigms from plain text using graphical models over strings , 2011 .

[10] Yejin Choi,et al. The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task , 2017, CoNLL.

[11] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[13] Jason Eisner,et al. Parameter Estimation for Probabilistic Finite-State Transducers , 2002, ACL.

[14] Dani Yogatama,et al. Bayesian Optimization of Text Representations , 2015, EMNLP.

[15] Ming Zhou,et al. Identifying Synonyms among Distributionally Similar Words , 2003, IJCAI.

[16] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[17] Ryan Cotterell,et al. Weighting Finite-State Transductions With Neural Context , 2016, NAACL.

[18] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[19] Ryan Cotterell,et al. Modeling Word Forms Using Latent Underlying Morphs and Phonology , 2015, TACL.

[20] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21] Paul Gastin,et al. The Kleene-Schützenberger Theorem for Formal Power Series in Partially Commuting Variables , 1999, Inf. Comput..

[22] Roy Schwartz,et al. Symmetric Patterns and Coordinations: Fast and Enhanced Representations of Verbs and Adjectives , 2016, HLT-NAACL.

[23] Yoav Goldberg,et al. A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[24] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.

[25] Omer Levy,et al. Recurrent Additive Networks , 2017, ArXiv.

[26] Roy Schwartz,et al. Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction , 2015, CoNLL.

[27] Guillaume Lample,et al. Evaluation of Word Vector Representations by Subspace Alignment , 2015, EMNLP.

[28] Hava T. Siegelmann,et al. On the computational power of neural nets , 1992, COLT '92.

[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[31] L. Baum,et al. Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[32] C. Lee Giles,et al. Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[33] Peng Zhou,et al. Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling , 2016, COLING.

[34] Joshua Goodman,et al. Semiring Parsing , 1999, CL.

[35] Van Nostrand,et al. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[36] Regina Barzilay,et al. Rationalizing Neural Predictions , 2016, EMNLP.

[37] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[38] Hal Daumé,et al. Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[39] Ari Rappoport,et al. Unsupervised Discovery of Generic Relationships Using Pattern Clusters and its Evaluation by Automatically Generated SAT Analogy Questions , 2008, ACL.

[40] Daniel Jurafsky,et al. Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[41] J. Sakarovitch. Rational and Recognisable Power Series , 2009 .

[42] Ari Rappoport,et al. Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[43] Alexander M. Rush,et al. Character-Aware Neural Language Models , 2015, AAAI.

[44] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[45] Richard Socher,et al. Quasi-Recurrent Neural Networks , 2016, ICLR.

[46] Ido Dagan,et al. Improving Hypernymy Detection with an Integrated Path-based and Distributional Method , 2016, ACL.

[47] Hod Lipson,et al. Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[48] Ralph Grishman,et al. Modeling Skip-Grams for Event Detection with Convolutional Neural Networks , 2016, EMNLP.

[49] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[50] Noah A. Smith,et al. Linguistic Structured Sparsity in Text Categorization , 2014, ACL.

[51] Maartje E. J. Raijmakers,et al. Hidden Markov Model Interpretations of Neural Networks , 2000, NCPW.

[52] I. Lee Hetherington. The MIT finite-state transducer toolkit for speech and language processing , 2004, INTERSPEECH.

[53] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[54] Hava T. Siegelmann,et al. On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[55] Grzegorz Chrupala,et al. Representation of Linguistic Form and Function in Recurrent Neural Networks , 2016, CL.

[56] Georg Heigold,et al. WFST Enabled Solutions to ASR Problems: Beyond HMM Decoding , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[57] Claire Cardie,et al. Multi-Level Structured Models for Document-Level Sentiment Classification , 2010, EMNLP.

[58] Jason Eisner,et al. Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper) , 2016, SPNLP@EMNLP.

[59] Andreas Maletti,et al. Recurrent Neural Networks as Weighted Language Recognizers , 2017, NAACL.

[60] Regina Barzilay,et al. Molding CNNs for text: non-linear, non-consecutive convolutions , 2015, EMNLP.

[61] Roy Schwartz,et al. How Well Do Distributional Models Capture Different Types of Semantic Knowledge? , 2015, ACL.

[62] Jure Leskovec,et al. Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[63] Fernando Pereira,et al. Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[64] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[65] Jithendra Vepa,et al. Juicer: A Weighted Finite-State Transducer Speech Decoder , 2006, MLMI.

[66] Doug Downey,et al. Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..