A graph digital signal processing method for semantic analysis

This paper focuses on the problem of devising a computationally tractable procedure for representing the natural language understanding (NLU). It approaches this goal, by using distributional models of meaning through a method from graph-based digital signal processing (DSP) which only recently grabbed the attention of researchers from the field of natural language processing (NLP) related to big data analysis. The novelty of our approach lies in the combination of three domains: advances in deep learning algorithms for word representation, dependency parsing for modeling inter-word relations and convolution using orthogonal Hadamard codes for composing the two previous areas, generating a unique representation for the sentence. Two types of problems are resolved in a new unified way: sentence similarity given by the cos function of the corresponding vectors and question-answering where the query is matched to possible answers. This technique resembles the spread spectrum methods from telecommunication theory where multiple users share a common channel, and are able to communicate without interference. In the content of this paper the case of individual words play the role of users sharing the same sentence. Examples of the method application to a standard set of sentences, used for benchmarking the accuracy and the execution time is also given.

[1]  Stephen Clark,et al.  Evaluation of Simple Distributional Compositional Operations on Longer Texts , 2014, LREC.

[2]  Thierry Poibeau,et al.  A Tensor-based Factorization Model of Semantic Compositionality , 2013, NAACL.

[3]  Nils M. Kriege,et al.  Subgraph Matching Kernels for Attributed Graphs , 2012, ICML.

[4]  Arthur D. Szlam,et al.  Diffusion wavelet packets , 2006 .

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Giuseppe Sansonetti,et al.  Signal-based user recommendation on twitter , 2013, WWW.

[7]  Xin Ye Li XML Document Clustering Based on Spectral Analysis Method , 2011 .

[8]  José M. F. Moura,et al.  Discrete signal processing on graphs: Graph filters , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  David F. Gleich,et al.  Algorithms and Models for the Web Graph , 2014, Lecture Notes in Computer Science.

[10]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[11]  Annalina Caputo,et al.  A Study on Compositional Semantics of Words in Distributional Spaces , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[12]  N. Challa,et al.  Adaptive Multicasting using Common Spreading Codes in Infrastructure-to-Vehicle Communication Networks , 2007, 2007 Mobile Networking for Vehicular Environments.

[13]  Pierre Vandergheynst,et al.  Vertex-Frequency Analysis on Graphs , 2013, ArXiv.

[14]  Patrick J. Wolfe,et al.  Toward signal processing theory for graphs and non-Euclidean data , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Rafael E. Banchs Text Mining with MATLAB® , 2012, Springer New York.

[16]  Chang Wang,et al.  Relation Extraction with Relation Topics , 2011, EMNLP.

[17]  Jeremy Kepner,et al.  A scalable signal processing architecture for massive graph analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  sang-woo Jo,et al.  A Study on , 2008 .

[19]  Kotagiri Ramamohanarao,et al.  Spectral-Based Document Retrieval , 2004, ASIAN.

[20]  Ruye Wang,et al.  Introduction to Orthogonal Transforms: With Applications in Data Processing and Analysis , 2012 .

[21]  Jure Leskovec,et al.  Multiplicative Attribute Graph Model of Real-World Networks , 2010, Internet Math..

[22]  Benjamin A. Miller,et al.  Efficient anomaly detection in dynamic, attributed graphs: Emerging phenomena and big data , 2013, 2013 IEEE International Conference on Intelligence and Security Informatics.

[23]  R. Coifman,et al.  Diffusion Wavelets , 2004 .

[24]  S. Agaian Hadamard Matrices and Their Applications , 1985 .

[25]  John G. Proakis,et al.  Digital Signal Processing Using MATLAB , 1999 .

[26]  Massimo Melucci,et al.  An Information Retrieval Model Based on Discrete Fourier Transform , 2010, IRFC.

[27]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[28]  Marco Baroni,et al.  Frege in Space: A Program for Composition Distributional Semantics , 2014, LILT.

[29]  Michael Robinson,et al.  Topological Signal Processing , 2014 .

[30]  Pascal Frossard,et al.  Multiscale event detection in social media , 2014, Data Mining and Knowledge Discovery.

[31]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[32]  Jean-François Boulicaut,et al.  Trend Mining in Dynamic Attributed Graphs , 2013, ECML/PKDD.

[33]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[34]  José M. F. Moura,et al.  Discrete Signal Processing on Graphs , 2012, IEEE Transactions on Signal Processing.

[35]  Sridhar Mahadevan,et al.  Multiscale analysis of document corpora based on diffusion models , 2009, IJCAI 2009.