论文信息 - Symbolic, Distributed, and Distributional Representations for Natural Language Processing in the Era of Deep Learning: A Survey

Symbolic, Distributed, and Distributional Representations for Natural Language Processing in the Era of Deep Learning: A Survey

Natural language is inherently a discrete symbolic representation of human knowledge. Recent advances in machine learning (ML) and in natural language processing (NLP) seem to contradict the above intuition: discrete symbols are fading away, erased by vectors or tensors called distributed and distributional representations. However, there is a strict link between distributed/distributional representations and discrete symbols, being the first an approximation of the second. A clearer understanding of the strict link between distributed/distributional representations and symbols may certainly lead to radically new deep learning networks. In this paper we make a survey that aims to renew the link between symbolic representations and distributed/distributional representations. This is the right time to revitalize the area of interpreting how discrete symbols are represented inside neural networks.

Fabio Massimo Zanzotto | Lorenzo Ferrone | L. Ferrone

[1] R. Bellman. Dynamic programming. , 1957, Science.

[2] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[3] Slav Petrov,et al. Structured Training for Neural Network Transition-Based Parsing , 2015, ACL.

[4] Stephen Clark,et al. Mathematical Foundations for a Compositional Distributional Model of Meaning , 2010, ArXiv.

[5] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[6] H FriedmanJerome. On Bias, Variance, 0/1Loss, and the Curse-of-Dimensionality , 1997 .

[7] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[9] Michael Collins,et al. Convolution Kernels for Natural Language , 2001, NIPS.

[10] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[11] Mirella Lapata,et al. Dependency-Based Construction of Semantic Space Models , 2007, CL.

[12] S. Clark,et al. A Compositional Distributional Model of Meaning , 2008 .

[13] Phil Blunsom,et al. Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.

[14] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[15] Alberto D. Pascual-Montano,et al. A survey of dimensionality reduction techniques , 2014, ArXiv.

[16] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17] Fabio Massimo Zanzotto,et al. When the Whole Is Not Greater Than the Combination of Its Parts: A “Decompositional” Look at Compositional Distributional Semantics , 2015, Computational Linguistics.

[18] Fabio Massimo Zanzotto,et al. Distributed Tree Kernels , 2012, ICML.

[19] Mehrnoosh Sadrzadeh,et al. Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[20] Tim van Gelder,et al. Compositionality: A Connectionist Variation on a Classical Theme , 1990, Cogn. Sci..

[21] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.

[22] Cheng-Yuan Liou,et al. Autoencoder for words , 2014, Neurocomputing.

[23] Andrew Y. Ng,et al. Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[24] Ido Dagan,et al. Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[25] J. Huang,et al. Curse of dimensionality and particle filters , 2003, 2003 IEEE Aerospace Conference Proceedings (Cat. No.03TH8652).

[26] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[27] Patrick Pantel,et al. From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[28] Christopher D. Manning,et al. Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[29] Mirella Lapata,et al. Vector-based Models of Semantic Composition , 2008, ACL.

[30] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31] Geoffrey E. Hinton,et al. Distributed representations and nested compositional structure , 1994 .

[32] Omer Levy,et al. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[33] Jürgen Schmidhuber,et al. Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[34] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[35] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[36] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[37] Ivan Markovsky,et al. Low Rank Approximation - Algorithms, Implementation, Applications , 2018, Communications and Control Engineering.

[38] G. Frege. Die Grundlagen der Arithmetik : eine logisch mathematische Untersuchung über den Begriff der Zahl , 1884 .

[39] G. Dunteman. Principal Components Analysis , 1989 .

[40] T. Landauer,et al. A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[41] Gerard Salton,et al. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[42] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.

[44] Jean-Michel Renders,et al. Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[45] Peter D. Turney. Similarity of Semantic Relations , 2006, CL.

[46] Alessandro Lenci,et al. Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[47] Eamonn J. Keogh,et al. Curse of Dimensionality , 2010, Encyclopedia of Machine Learning.

[48] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[49] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[50] R. Montague. Formal philosophy; selected papers of Richard Montague , 1974 .

[51] Richard Montague,et al. ENGLISH AS A FORMAL LANGUAGE , 1975 .

[52] VincentPascal,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010 .

[53] Andrew A. Chien. Introducing Communications' regional special sections , 2018, Commun. ACM.

[54] Jerome H. Friedman,et al. On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality , 2004, Data Mining and Knowledge Discovery.