Virtual lightweight snapshots for consistent analytics in NoSQL stores

Increasingly, applications that deal with big data need to run analytics concurrently with updates. But bridging the gap between big and fast data is challenging: most of these applications require analytics' results that are fresh and consistent, but without impacting system latency and throughput. We propose virtual lightweight snapshots (VLS), a mechanism that enables consistent analytics without blocking incoming updates in NoSQL stores. VLS requires neither native support for database versioning nor a transaction manager. Besides, it is storage-efficient, keeping additional versions of records only when needed to guarantee consistency, and sharing versions across multiple concurrent snapshots. We describe an implementation of VLS in MongoDB and present a detailed experimental evaluation which shows that it supports consistency for analytics with small impact on query evaluation time, update throughput, and latency.

[1]  Hamid Pirahesh,et al.  Efficient and flexible methods for transient versioning of records to avoid locking by read-only transactions , 1992, SIGMOD '92.

[2]  Yannis Pavlidis,et al.  Anatomy of a gift recommendation engine powered by social media , 2012, SIGMOD Conference.

[3]  Hector Garcia-Molina,et al.  Main Memory Database Systems: An Overview , 1992, IEEE Trans. Knowl. Data Eng..

[4]  Tim Kraska,et al.  Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views , 2015, Proc. VLDB Endow..

[5]  William E. Weihl,et al.  Distributed Version Management for Read-Only Actions , 1985, IEEE Transactions on Software Engineering.

[6]  Stephen Fox,et al.  The implementation of an integrated concurrency control and recovery scheme , 1982, SIGMOD '82.

[7]  Dibyendu Majumdar A Quick Survey of MultiVersion Concurrency Algorithms , 2007 .

[8]  Liuba Shrira,et al.  A Modular and Efficient Past State System for Berkeley DB , 2014, USENIX Annual Technical Conference.

[9]  Martin Hirzel,et al.  Event Processing over a Distributed JSON Store: Design and Performance , 2014, WISE.

[10]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[11]  Paul M. Bober,et al.  On mixing queries and transactions via multiversion locking , 1992, [1992] Eighth International Conference on Data Engineering.

[12]  George Candea,et al.  A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses , 2009, Proc. VLDB Endow..

[13]  Paul R. Wilson,et al.  Uniprocessor Garbage Collection Techniques , 1992, IWMM.

[14]  Alan Fekete,et al.  Snapshot Isolation , 2009, Encyclopedia of Database Systems.

[15]  Subramanian Arumugam,et al.  The DataPath system: a data-centric analytic processing engine for large data warehouses , 2010, SIGMOD Conference.

[16]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[17]  Gang Chen,et al.  R-Store: A scalable distributed system for supporting real-time analytics , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[18]  Wolfgang Lehner,et al.  Efficient transaction processing in SAP HANA database: the end of a column store myth , 2012, SIGMOD Conference.

[19]  Gilad Mishne,et al.  Fast data in the era of big data: Twitter's real-time related query suggestion architecture , 2012, SIGMOD '13.

[20]  Anand R. Tripathi,et al.  Scalable Transaction Management with Snapshot Isolation on Cloud Data Management Systems , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[21]  S. Sudarshan,et al.  Logical and Physical Versioning in Main Memory Databases , 1997, VLDB.

[22]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[23]  Matthew Arnold,et al.  META: Middleware for Events, Transactions, and Analytics , 2016, IBM J. Res. Dev..

[24]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[25]  Alexander Zeier,et al.  HYRISE - A Main Memory Hybrid Storage Engine , 2010, Proc. VLDB Endow..

[26]  Martin Hirzel,et al.  A Pattern Calculus for Rule Languages: Expressiveness, Compilation, and Mechanization , 2015, ECOOP.