Automated Real-Time Analysis of Streaming Big and Dense Data on Reconfigurable Platforms

We propose SSketch, a novel automated framework for efficient analysis of dynamic big data with dense (non-sparse) correlation matrices on reconfigurable platforms. SSketch targets streaming applications where each data sample can be processed only once and storage is severely limited. Our framework adaptively learns from the stream of input data and updates a corresponding ensemble of lower-dimensional data structures, a.k.a., a sketch matrix. A new sketching methodology is introduced that tailors the problem of transforming the big data with dense correlations to an ensemble of lower-dimensional subspaces such that it is suitable for hardware-based acceleration performed by reconfigurable hardware. The new method is scalable, while it significantly reduces costly memory interactions and enhances matrix computation performance by leveraging coarse-grained parallelism existing in the dataset. SSketch provides an automated optimization methodology for creating the most accurate data sketch for a given set of user-defined constraints, including runtime and power as well as platform constraints such as memory. To facilitate automation, SSketch takes advantage of a Hardware/Software (HW/SW) co-design approach: It provides an Application Programming Interface that can be customized for rapid prototyping of an arbitrary matrix-based data analysis algorithm. Proof-of-concept evaluations on a variety of visual datasets with more than 11 million non-zeros demonstrate up to a 200-fold speedup on our hardware-accelerated realization of SSketch compared to a software-based deployment on a general-purpose processor.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  Oleg Maslennikov,et al.  Implementation of Givens QR-Decomposition in FPGA , 2001, PPAM.

[3]  Elizabeth A. Peck,et al.  Introduction to Linear Regression Analysis , 2001 .

[4]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[5]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[6]  Sanguthevar Rajasekaran,et al.  A Novel Scheme for the Parallel Computation of SVDs , 2006, HPCC.

[7]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[8]  Mircea Andrecut,et al.  Fast GPU Implementation of Sparse Signal Recovery from Random Projections , 2008, Eng. Lett..

[9]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[10]  Avi Septimus,et al.  Compressive sampling hardware reconstruction , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[11]  Andreas Peter Burg,et al.  Matching pursuit: Evaluation and implementatio for LTE channel estimation , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[12]  Arturo Garcia-Perez,et al.  Reconfigurable FPGA-Based Unit for Singular Value Decomposition of Large m x n Matrices , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.

[13]  Y. Simmhan,et al.  Towards Reliable, Performant Workflows for Streaming-Applications on Cloud Platforms , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[14]  Liang Chen,et al.  GPU Implementation of Orthogonal Matching Pursuit for Compressive Sensing , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[15]  Antonio J. Plaza,et al.  Parallel Hyperspectral Image and Signal Processing [Applications Corner] , 2011, IEEE Signal Processing Magazine.

[16]  Hubert Kaeslin,et al.  High-speed compressed sensing reconstruction on FPGA using OMP and AMP , 2012, 2012 19th IEEE International Conference on Electronics, Circuits, and Systems (ICECS 2012).

[17]  Wei Zhang,et al.  Portable and scalable FPGA-based acceleration of a direct linear system solver , 2008, 2008 International Conference on Field-Programmable Technology.

[18]  Tinoosh Mohsenin,et al.  High performance compressive sensing reconstruction hardware with QRD process , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[19]  Gordon Wetzstein,et al.  Compressive light field photography using overcomplete dictionaries and optimized projections , 2013, ACM Trans. Graph..

[20]  Aljoscha Smolic,et al.  Evaluation and FPGA Implementation of Sparse Linear Solvers for Video Processing Applications , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Stephen Dean Brown,et al.  Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs , 2013, TRETS.

[22]  Edo Liberty,et al.  Simple and deterministic matrix sketching , 2012, KDD.

[23]  Tinoosh Mohsenin,et al.  Low-complexity FPGA implementation of compressive sensing reconstruction , 2013, 2013 International Conference on Computing, Networking and Communications (ICNC).

[24]  Wenyao Xu,et al.  A single-precision compressive sensing signal reconstruction engine on FPGAs , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[25]  Dimitris S. Papailiopoulos,et al.  Sparse PCA through Low-rank Approximations , 2013, ICML.

[26]  Aswin C. Sankaranarayanan,et al.  Greedy feature selection for subspace clustering , 2013, J. Mach. Learn. Res..

[27]  Jared Tanner,et al.  GPU accelerated greedy algorithms for compressed sensing , 2013, Mathematical Programming Computation.

[28]  Jason Cong,et al.  Combining computation and communication optimizations in system synthesis for streaming applications , 2014, FPGA.

[29]  Houman Homayoun,et al.  A parallel and reconfigurable architecture for efficient OMP compressive sensing reconstruction , 2014, GLSVLSI '14.

[30]  Farinaz Koushanfar,et al.  RankMap: A Platform-Aware Framework for Distributed Learning from Dense Datasets , 2015, ArXiv.

[31]  Farinaz Koushanfar,et al.  SSketch: An Automated Framework for Streaming Sketch-Based Analysis of Big Data on FPGA , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[32]  Tinoosh Mohsenin,et al.  Low energy sketching engines on many-core platform for big data acceleration , 2016, 2016 International Great Lakes Symposium on VLSI (GLSVLSI).

[33]  Tinoosh Mohsenin,et al.  Sketching-based high-performance biomedical big data processing accelerator , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[34]  Farinaz Koushanfar,et al.  Perform-ML: Performance optimized machine learning by platform and content aware customization , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[35]  Tinoosh Mohsenin,et al.  CS-Based Secured Big Data Processing on FPGA , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).