Event Processing over a Distributed JSON Store: Design and Performance

Web applications are increasingly built to target both desktop and mobile users. As a result, modern Web development infrastructure must be able to process large numbers of events (e.g., for location-based features) and support analytics over those events, with applications ranging from banking (e.g., fraud detection) to retail (e.g., just-in-time personalized promotions). We describe a system specifically designed for those applications, allowing high-throughput event processing along with analytics. Our main contribution is the design and implementation of an in-memory JSON store that can handle both events and analytics workloads. The store relies on the JSON model in order to serve data through a common Web API. Thanks to the flexibility of the JSON model, the store can integrate data from systems of record (e.g., customer profiles) with data transmitted between the server and a large number of clients (e.g., location-based events or transactions). The proposed store is built over a distributed, transactional, in-memory object cache for performance. Our experiments show that our implementation handles high throughput and low latency without sacrificing scalability.

[1]  Salvatore J. Stolfo,et al.  JAM: Java Agents for Meta-Learning over Distributed Databases , 1997, KDD.

[2]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[3]  Anthony J. Bonner,et al.  Workflow, transactions and datalog , 1999, PODS.

[4]  Zahir Tari,et al.  On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE, OTM Confederated International Conferences, CoopIS, DOA, GADA, and ODBASE 2006, Montpellier, France, October 29 - November 3, 2006. Proceedings, Part I , 2006, OTM Conferences.

[5]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[6]  Frank Dabek,et al.  Large-scale Incremental Processing Using Distributed Transactions and Notifications , 2010, OSDI.

[7]  Charles L. Forgy,et al.  OPS5 user's manual , 1981 .

[8]  Joseph M. Hellerstein,et al.  Boom analytics: exploring data-centric, declarative programming for the cloud , 2010, EuroSys '10.

[9]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[10]  Johannes Gehrke,et al.  Distributed event stream processing with non-deterministic finite automata , 2009, DEBS '09.

[11]  Kun-Lung Wu,et al.  IBM Streams Processing Language: Analyzing Big Data in motion , 2013, IBM J. Res. Dev..

[12]  Jennifer Widom,et al.  Production Rules in Parallel and Distributed Database Environments , 1992, VLDB.

[13]  Martin Hirzel,et al.  Partition and compose: parallel complex event processing , 2012, DEBS.

[14]  Verena Kantere,et al.  Distributed Triggers for Peer Data Management , 2006, OTM Conferences.

[15]  D. Florescu,et al.  JSONiq: The History of a Query Language , 2013, IEEE Internet Computing.

[16]  J. Chris Anderson,et al.  CouchDB: The Definitive Guide , 2010 .