Big Data Analysis with Signal Processing on Graphs: Representation and processing of massive data sets with irregular structure

Analysis and processing of very large data sets, or big data, poses a significant challenge. Massive data sets are collected and studied in numerous domains, from engineering sciences to social networks, biomolecular research, commerce, and security. Extracting valuable information from big data requires innovative approaches that efficiently process large amounts of data as well as handle and, moreover, utilize their structure. This article discusses a paradigm for large-scale data analysis based on the discrete signal processing (DSP) on graphs (DSPG). DSPG extends signal processing concepts and methodologies from the classical signal processing theory to data indexed by general graphs. Big data analysis presents several challenges to DSPG, in particular, in filtering and frequency analysis of very large data sets. We review fundamental concepts of DSPG, including graph signals and graph filters, graph Fourier transform, graph frequency, and spectrum ordering, and compare them with their counterparts from the classical signal processing theory. We then consider product graphs as a graph model that helps extend the application of DSPG methods to large data sets through efficient implementation based on parallelization and vectorization. We relate the presented framework to existing methods for large-scale data processing and illustrate it with an application to data compression.

[1]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[2]  José M. F. Moura,et al.  Discrete signal processing on graphs: Graph fourier transform , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Jennifer Neville,et al.  Tied Kronecker product graph models to capture variance in network populations , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Christos Faloutsos,et al.  Scalable modeling of real graphs using Kronecker multiplication , 2007, ICML '07.

[5]  Jelena Kovacevic,et al.  Algebraic Signal Processing Theory: Cooley-Tukey-Type Algorithms for Polynomial Transforms Based on Induction , 2010, SIAM J. Matrix Anal. Appl..

[6]  S. Mallat A wavelet tour of signal processing , 1998 .

[7]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[8]  Nikos D. Sidiropoulos,et al.  Parallel factor analysis in sensor array processing , 2000, IEEE Trans. Signal Process..

[9]  José M. F. Moura,et al.  Discrete Signal Processing on Graphs , 2012, IEEE Transactions on Signal Processing.

[10]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[11]  Lieven De Lathauwer,et al.  Blind Identification of Underdetermined Mixtures by Simultaneous Matrix Diagonalization , 2008, IEEE Transactions on Signal Processing.

[12]  William S Rayens,et al.  Structure-seeking multilinear methods for the analysis of fMRI data , 2004, NeuroImage.

[13]  Jennifer Neville,et al.  Learning mixed kronecker product graph models with simulated method of moments , 2013, KDD.

[14]  Brendan D. McKay,et al.  A new graph product and its spectrum , 1978, Bulletin of the Australian Mathematical Society.

[15]  A.K. Krishnamurthy,et al.  Multidimensional digital signal processing , 1985, Proceedings of the IEEE.

[16]  José M. F. Moura,et al.  Classification via regularization on graphs , 2013, 2013 IEEE Global Conference on Signal and Information Processing.

[17]  José M. F. Moura,et al.  Discrete Signal Processing on Graphs: Frequency Analysis , 2013, IEEE Transactions on Signal Processing.

[18]  José M. F. Moura,et al.  The Algebraic Approach to the Discrete Cosine and Sine Transforms and Their Fast Algorithms , 2003, SIAM J. Comput..

[19]  José M. F. Moura,et al.  Algebraic Signal Processing Theory: Cooley–Tukey Type Algorithms for DCTs and DSTs , 2007, IEEE Transactions on Signal Processing.

[20]  Markus Püschel,et al.  Computer Generation of General Size Linear Transform Libraries , 2009, 2009 International Symposium on Code Generation and Optimization.

[21]  C. Loan,et al.  Approximation with Kronecker Products , 1992 .

[22]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[23]  Nikos D. Sidiropoulos,et al.  Blind PARAFAC receivers for DS-CDMA systems , 2000, IEEE Trans. Signal Process..

[24]  Sunil K. Narang,et al.  Perfect Reconstruction Two-Channel Wavelet Filter Banks for Graph Structured Data , 2011, IEEE Transactions on Signal Processing.

[25]  Wilfried Imrich,et al.  Partial Star Products: A Local Covering Approach for the Recognition of Approximate Cartesian Product Graphs , 2013, Math. Comput. Sci..

[26]  Franz Franchetti,et al.  Discrete fourier transform on multicore , 2009, IEEE Signal Processing Magazine.

[27]  Peter Lancaster,et al.  The theory of matrices , 1969 .

[28]  Fumikazu Miwakeichi,et al.  Decomposing EEG data into space–time–frequency components using Parallel Factor Analysis , 2004, NeuroImage.

[29]  Daniel Merkle,et al.  Extended shapes for the combinatorial design of RNA sequences , 2009, Int. J. Comput. Biol. Drug Des..

[30]  Petros Drineas,et al.  Tensor-CUR decompositions for tensor-based data , 2006, KDD '06.

[31]  Bülent Yener,et al.  Unsupervised Multiway Data Analysis: A Literature Survey , 2009, IEEE Transactions on Knowledge and Data Engineering.

[32]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[33]  Wilfried Imrich,et al.  Topics in Graph Theory: Graphs and Their Cartesian Product , 2008 .

[34]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Pascal Frossard,et al.  The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains , 2012, IEEE Signal Processing Magazine.

[36]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[37]  G KoldaTamara,et al.  Tensor Decompositions and Applications , 2009 .

[38]  Michael W. Mahoney,et al.  Future Directions in Tensor-Based Computation and Modeling , 2009 .

[39]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[40]  Markus Püschel,et al.  Algebraic Signal Processing Theory: Foundation and 1-D Time , 2008, IEEE Transactions on Signal Processing.

[41]  José M. F. Moura,et al.  Algebraic Signal Processing Theory: 1-D Space , 2008, IEEE Transactions on Signal Processing.

[42]  Martin Vetterli,et al.  Fast Fourier transforms: a tutorial review and a state of the art , 1990 .

[43]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[44]  Pierre Vandergheynst,et al.  Wavelets on Graphs via Spectral Graph Theory , 2009, ArXiv.