Top-K queries in RDF graph-based stream processing with actors

In this paper, we describe our novel system named as RGraSPA an RDF Graph-based Stream Processing with Actors, which adheres to the realm of RDF graph and knowledge reasoning, and uses an actor model for distribution of continuous queries. Furthermore, we present our approach to solve DEBS Grand Challenge by employing our system. RGraSPA uses RDF graph-based event model to encapsulate a set of triples and process them in continuous manner. We also present our synchronised structure traversal algorithm that uses Range tree to store results in a sorted view, where each node of the tree maintains a balanced Multimap Binary Search Tree (BST). The range of each node is adaptive and updated according to the incoming values and defined size of the Multimap BST for each node. In order to solve the DEBS challenge, we provide a formal method to calculate cell IDs from the longitude and latitude in a streaming fashion and use two Range trees for 10 most frequent routes and profitable areas. Our experimental results show that the query execution time can be optimised by carefully adjusting the cardinality values of Range tree. Our solution processes 1 year worth of RD-Fised data (372 GB) (approx 3.4 billion triples) for Taxis in 1.8 hours.

[1]  Sasu Tarkoma,et al.  Evaluating continuous top-k queries over document streams , 2012, World Wide Web.

[2]  Danh Le Phuoc,et al.  A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data , 2011, SEMWEB.

[3]  Charles F. F. Karney Transverse Mercator with an accuracy of a few nanometers , 2010, 1002.1417.

[4]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[5]  Daniel J. Abadi,et al.  Query optimization of distributed pattern matching , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[6]  Yves Raimond,et al.  RDF 1.1 Primer , 2014 .

[7]  Pradeep Dubey,et al.  Sort vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs , 2009, Proc. VLDB Endow..

[8]  Jeffrey Xu Yu,et al.  Sliding-window top-k queries on uncertain streams , 2008, The VLDB Journal.

[9]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[10]  Daniele Braga,et al.  C-SPARQL: SPARQL for continuous querying , 2009, WWW '09.

[11]  Jennifer Widom,et al.  Towards a streaming SQL standard , 2008, Proc. VLDB Endow..

[12]  Carl Hewitt,et al.  Actor Model for Discretionary, Adaptive Concurrency , 2010, ArXiv.

[13]  C. Michael Sperberg-McQueen,et al.  World Wide Web Consortium , 2009, Encyclopedia of Database Systems.

[14]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[15]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[16]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[17]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.