Modeling and Optimization for Big Data Analytics: (Statistical) learning tools for our era of data deluge

With pervasive sensors continuously collecting and storing massive amounts of information, there is no doubt this is an era of data deluge. Learning from these large volumes of data is expected to bring significant science and engineering advances along with improvements in quality of life. However, with such a big blessing come big challenges. Running analytics on voluminous data sets by central processors and storage units seems infeasible, and with the advent of streaming data sources, learning must often be performed in real time, typically without a chance to revisit past entries. Workhorse signal processing (SP) and statistical learning tools have to be re-examined in todays high-dimensional data regimes. This article contributes to the ongoing cross-disciplinary efforts in data science by putting forth encompassing models capturing a wide range of SP-relevant data analytic tasks, such as principal component analysis (PCA), dictionary learning (DL), compressive sampling (CS), and subspace clustering. It offers scalable architectures and optimization algorithms for decentralized and online learning problems, while revealing fundamental insights into the various analytic and implementation tradeoffs involved. Extensions of the encompassing models to timely data-sketching, tensor- and kernel-based learning tasks are also provided. Finally, the close connections of the presented framework with several big data tasks, such as network visualization, decentralized and dynamic estimation, prediction, and imputation of network link load traffic, as well as imputation in tensor-based medical imaging are highlighted.

[1]  Peter Buhlmann,et al.  Pattern Alternating Maximization Algorithm for High-Dimensional Missing Data , 2010 .

[2]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[3]  R. Bro,et al.  PARAFAC and missing values , 2005 .

[4]  Georgios B. Giannakis,et al.  Consensus-based distributed linear support vector machines , 2010, IPSN '10.

[5]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[6]  Francisco Facchinei,et al.  Decomposition by Partial Linearization: Parallel Optimization of Multi-Agent Systems , 2013, IEEE Transactions on Signal Processing.

[7]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[8]  Nikos D. Sidiropoulos,et al.  Large Scale Tensor Decompositions: Algorithmic Developments and Applications , 2013, IEEE Data Eng. Bull..

[9]  Johan A. K. Suykens,et al.  Tensor Versus Matrix Completion: A Comparison With Application to Spectral Data , 2011, IEEE Signal Processing Letters.

[10]  Shuicheng Yan,et al.  Online Robust PCA via Stochastic Optimization , 2013, NIPS.

[11]  Gonzalo Mateos,et al.  Distributed Sparse Linear Regression , 2010, IEEE Transactions on Signal Processing.

[12]  Michael W. Mahoney,et al.  A randomized algorithm for a tensor-based generalization of the singular value decomposition , 2007 .

[13]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[14]  I. Yamada,et al.  Over-relaxation of the fast iterative shrinkage-thresholding algorithm with variable stepsize , 2011 .

[15]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[16]  Hongkai Zhao,et al.  Robust principal component analysis-based four-dimensional computed tomography , 2011, Physics in medicine and biology.

[17]  Morteza Mardani,et al.  Dynamic Anomalography: Tracking Network Anomalies Via Sparsity and Low Rank , 2012, IEEE Journal of Selected Topics in Signal Processing.

[18]  L. Nelson Data, data everywhere. , 1997, Critical care medicine.

[19]  Laura Balzano,et al.  Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Georgios B. Giannakis,et al.  Optimal resource allocation for MIMO ad hoc cognitive radio networks , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[21]  Georgios B. Giannakis,et al.  Nonparametric Basis Pursuit via Sparse Kernel-Based Learning: A Unifying View with Advances in Blind Methods , 2013, IEEE Signal Processing Magazine.

[22]  Michael I. Jordan On statistics, computation and scalability , 2013, ArXiv.

[23]  Gonzalo Mateos,et al.  Rank Regularization and Bayesian Inference for Tensor Completion and Extrapolation , 2013, IEEE Transactions on Signal Processing.

[24]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[25]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[26]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[27]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[28]  Namrata Vaswani,et al.  Recursive sparse recovery in large but correlated noise , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[29]  Lane Harrison,et al.  The future of security visualization: Lessons from network visualization , 2012, IEEE Network.

[30]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[31]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[32]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[33]  Kjersti Engan,et al.  Recursive Least Squares Dictionary Learning Algorithm , 2010, IEEE Transactions on Signal Processing.

[34]  Guillermo Sapiro,et al.  Dimensionality Reduction via Subspace and Submanifold Learning [From the Guest Editors] , 2011, IEEE Signal Process. Mag..

[35]  G. Giannakis,et al.  Embedding Graphs under Centrality Constraints for Network Visualization , 2014, 1401.4408.

[36]  Alejandro Ribeiro,et al.  Consensus in Ad Hoc WSNs With Noisy Links—Part I: Distributed Estimation of Deterministic Signals , 2008, IEEE Transactions on Signal Processing.

[37]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[38]  Morteza Mardani,et al.  Decentralized Sparsity-Regularized Rank Minimization: Algorithms and Applications , 2012, IEEE Transactions on Signal Processing.

[39]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[40]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[41]  Morteza Mardani,et al.  Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors , 2014, IEEE Transactions on Signal Processing.

[42]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[43]  Emmanuel J. Candès,et al.  A Geometric Analysis of Subspace Clustering with Outliers , 2011, ArXiv.

[44]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[45]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[46]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[47]  A. Hall,et al.  Adaptive Switching Circuits , 2016 .

[48]  Georgios B. Giannakis,et al.  Prediction of Partially Observed Dynamical Processes Over Networks via Dictionary Learning , 2014, IEEE Transactions on Signal Processing.

[49]  Morteza Mardani,et al.  Recovery of Low-Rank Plus Compressed Sparse Matrices With Application to Unveiling Traffic Anomalies , 2012, IEEE Transactions on Information Theory.

[50]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[51]  P. Bickel,et al.  Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems , 2008, 0805.3034.

[52]  S.A. Kassam,et al.  Robust techniques for signal processing: A survey , 1985, Proceedings of the IEEE.

[53]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[54]  Georgios B. Giannakis,et al.  Monitoring and Optimization for Power Grids: A Signal Processing Perspective , 2013, IEEE Signal Processing Magazine.

[55]  Georgios B. Giannakis,et al.  Online dictionary learning from big data using accelerated stochastic approximation algorithms , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[56]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[57]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[58]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[59]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[60]  Tamara G. Kolda,et al.  Scalable Tensor Factorizations for Incomplete Data , 2010, ArXiv.

[61]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[62]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[63]  H. Robbins A Stochastic Approximation Method , 1951 .

[64]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[65]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data: Methods and Models , 2009 .

[66]  Guillermo Sapiro,et al.  Real-time Online Singing Voice Separation from Monaural Recordings Using Robust Low-rank Modeling , 2012, ISMIR.

[67]  PROCEssIng magazInE IEEE Signal Processing Magazine , 2004 .

[68]  Robert D. Nowak,et al.  Online identification and tracking of subspaces from highly incomplete information , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[69]  Gonzalo Mateos,et al.  Robust PCA as Bilinear Decomposition With Outlier-Sparsity Regularization , 2011, IEEE Transactions on Signal Processing.

[70]  J. Berge,et al.  On uniqueness in candecomp/parafac , 2002 .

[71]  Sergios Theodoridis,et al.  Adaptive Learning in a World of Projections , 2011, IEEE Signal Processing Magazine.

[72]  Mark Crovella,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM '04.

[73]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[74]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[75]  Waheed Uz Zaman Bajwa,et al.  Cloud K-SVD: Computing data-adaptive representations in the cloud , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[76]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[77]  Georgios B. Giannakis,et al.  Consensus-Based Distributed Support Vector Machines , 2010, J. Mach. Learn. Res..

[78]  Xuan Kong,et al.  Adaptive Signal Processing Algorithms: Stability and Performance , 1994 .

[79]  D. Butler Data, data everywhere... , 2005, Nature Structural &Molecular Biology.

[80]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[81]  Jieping Ye,et al.  Tensor Completion for Estimating Missing Values in Visual Data , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[82]  A. Robert Calderbank,et al.  PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares From Partial Observations , 2012, IEEE Transactions on Signal Processing.