Impression Store: Compressive Sensing-based Storage for Big Data Analytics

For many big data analytics workloads, approximate results suffice. This begs the question, whether and how the underlying system architecture can take advantage of such relaxations, thereby lifting constraints inherent in today's architectures. This position paper explores one of the possible directions. Impression Store is a distributed storage system with the abstraction of big data vectors. It aggregates updates internally and responds to the retrieval of top-K high-value entries. With proper extension, Impression Store supports various aggregations, top-K queries, outlier and major mode detection. While restricted in scope, such queries represent a substantial and important portion of many production workloads. In return, the system has unparalleled scalability; any node in the system can process any query, both reads and updates. The key technique we leverage is compressive sensing, a technique that substantially reduces the amount of active memory state, IO, and traffic volume needed to achieve such scalability.

[1]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[2]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[3]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[4]  Peter Deutsch,et al.  DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.

[5]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[6]  GhemawatSanjay,et al.  The Google file system , 2003 .

[7]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[8]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[9]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[10]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[11]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[12]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[13]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[14]  Ting Sun,et al.  Single-pixel imaging via compressive sampling , 2008, IEEE Signal Process. Mag..

[15]  Mircea Andrecut,et al.  Fast GPU Implementation of Sparse Signal Recovery from Random Projections , 2008, Eng. Lett..

[16]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Walter Willinger,et al.  Spatio-temporal compressive sensing and internet traffic matrices , 2009, SIGCOMM '09.

[18]  Liang Chen,et al.  GPU Implementation of Orthogonal Matching Pursuit for Compressive Sensing , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[19]  Nicolas Bruno,et al.  SCOPE: parallel databases meet MapReduce , 2012, The VLDB Journal.

[20]  Jiaxing Zhang,et al.  Distributed Outlier Detection using Compressive Sensing , 2015, SIGMOD Conference.