Upsortable: Programming TopK Queries Over Data Streams

Top-k queries over data streams is a well studied problem. There exists numerous systems allowing to process continuous queries over sliding windows. At the opposite, non-append only streams call for ad-hoc solutions, e.g. tailor-made solutions implemented in a mainstream programming language. In the meantime, the Stream API and lambda expressions have been added in Java 8, thus gaining powerful operations for data stream processing. However, the Java Collections Framework does not provide data structures to safely and conveniently support sorted collections of evolving data. In this paper, we demonstrate Upsortable, an annotation-based approach that allows to use existing sorted collections from the standard Java API for dynamic data management. Our approach relies on a combination of pre-compilation abstract syntax tree modifications and runtime analysis of bytecode. Upsortable offers the developer a safe and time-efficient solution for developing top-k queries on data streams while keeping a full compatibility with standard Java.

[1]  Weng-Fai Wong,et al.  StreamJIT , 2014, OOPSLA.

[2]  Garret Swart,et al.  Changing Engines in Midstream: A Java Stream Computational Model for Big Data Processing , 2014, Proc. VLDB Endow..

[3]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[4]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[5]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[6]  Anshul Jaiswal,et al.  Realtime Data Processing at Facebook , 2016, SIGMOD Conference.

[7]  Kyriakos Mouratidis,et al.  Continuous monitoring of top-k queries over sliding windows , 2006, SIGMOD Conference.

[8]  P. Flajolet,et al.  Loglog counting of large cardinalities , 2003 .

[9]  Andreas Behrend,et al.  Stream fusion using reactive programming, LINQ and magic updates , 2013, Proceedings of the 16th International Conference on Information Fusion.

[10]  Vincenzo Gulisano,et al.  The DEBS 2016 grand challenge , 2016, DEBS.

[11]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[12]  Patrick Th. Eugster,et al.  EventJava: An Extension of Java for Event Correlation , 2009, ECOOP.

[13]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[14]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Michael Stonebraker,et al.  S-Store: Streaming Meets Transaction Processing , 2015, Proc. VLDB Endow..

[16]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[17]  Gerhard Weikum,et al.  Counting at Large: Efficient Cardinality Estimation in Internet-Scale Data Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).