Streaming the Web: Reasoning over dynamic data

In the last few years a new research area, called stream reasoning, emerged to bridge the gap between reasoning and stream processing. While current reasoning approaches are designed to work on mainly static data, the Web is, on the other hand, extremely dynamic: information is frequently changed and updated, and new data is continuously generated from a huge number of sources, often at high rate. In other words, fresh information is constantly made available in the form of streams of new data and updates.Despite some promising investigations in the area, stream reasoning is still in its infancy, both from the perspective of models and theories development, and from the perspective of systems and tools design and implementation.The aim of this paper is threefold: (i)?we identify the requirements coming from different application scenarios, and we isolate the problems they pose; (ii)?we survey existing approaches and proposals in the area of stream reasoning, highlighting their strengths and limitations; (iii)?we draw a research agenda to guide the future research and development of stream reasoning. In doing so, we also analyze related research fields to extract algorithms, models, techniques, and solutions that could be useful in the area of stream reasoning.

[1]  Bernardo Cuenca Grau,et al.  OWL 2 Web Ontology Language: Profiles , 2009 .

[2]  Olivier Corby,et al.  Analysis of a Real Online Social Network Using Semantic Web Frameworks , 2009, SEMWEB.

[3]  Jeffrey D. Ullman,et al.  A survey of deductive database systems , 1995, J. Log. Program..

[4]  Alessandro Campi,et al.  A First Step Towards Stream Reasoning , 2009, FIS.

[5]  Neil Immerman,et al.  Efficient pattern matching over event streams , 2008, SIGMOD Conference.

[6]  Johannes Gehrke,et al.  Cayuga: a high-performance event processing engine , 2007, SIGMOD '07.

[7]  Freddy Lécué,et al.  Capturing the Pulse of Cities: Opportunity and Research Challenges for Robust Stream Data Reasoning , 2012, Semantic Cities @ AAAI.

[8]  Abraham Bernstein,et al.  Applied Temporal RDF: Efficient Temporal Querying of RDF Data with SPARQL , 2009, ESWC.

[9]  Dumitru Roman,et al.  Stream Reasoning: A Survey and Further Research Directions , 2009, FQAS.

[10]  Dieter Fensel,et al.  Sparkwave: continuous schema-enhanced pattern matching over RDF data streams , 2012, DEBS.

[11]  Chris T. A. Evelo,et al.  Applying linked data approaches to pharmacology: Architectural decisions and implementation , 2014, Semantic Web.

[12]  Schahram Dustdar,et al.  Esc: Towards an Elastic Stream Computing Platform for the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[13]  Gustavo Alonso,et al.  Complex event detection at wire speed with FPGAs , 2010, Proc. VLDB Endow..

[14]  Christopher Ré,et al.  Event queries on correlated probabilistic streams , 2008, SIGMOD Conference.

[15]  Peter F. Patel-Schneider,et al.  OWL 2 Web Ontology Language , 2009 .

[16]  Frank van Harmelen,et al.  Scalable Distributed Reasoning Using MapReduce , 2009, SEMWEB.

[17]  David C. Luckham,et al.  An Event-Based Architecture Definition Language , 1995, IEEE Trans. Software Eng..

[18]  Nigel Shadbolt,et al.  The Design and Implementation of Minimal RDFS Backward Reasoning in 4store , 2011, ESWC.

[19]  Nesime Tatbul,et al.  DejaVu: declarative pattern matching over live and archived streams of events , 2009, SIGMOD Conference.

[20]  Peter R. Pietzuch,et al.  Distributed complex event processing with query rewriting , 2009, DEBS '09.

[21]  Toyotaro Suzumura,et al.  Elastic Stream Computing with Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[22]  Daniele Braga,et al.  Incremental Reasoning on Streams and Rich Background Knowledge , 2010, ESWC.

[23]  Freddy Lécué,et al.  Real-Time Urban Monitoring in Dublin Using Semantic and Stream Technologies , 2013, SEMWEB.

[24]  John V. Harrison,et al.  Maintenance of Materialized Views in a Deductive Database: An Update Propagation Approach , 1992, Workshop on Deductive Databases, JICSLP.

[25]  Federica Paganelli,et al.  An Ontology-Based Context Model for Home Health Monitoring and Alerting in Chronic Patient Care Networks , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[26]  Michael Mendler,et al.  Towards Constructive DL for Abstraction and Refinement , 2009, Journal of Automated Reasoning.

[27]  Ian Horrocks,et al.  Order matters! Harnessing a world of orderings for reasoning over massive data , 2013, Semantic Web.

[28]  Alexander L. Wolf,et al.  Content-Based Networking: A New Communication Infrastructure , 2001, Infrastructure for Mobile and Wireless Systems.

[29]  M. Lévěque,et al.  What Is Next , 2014 .

[30]  Jennifer Widom,et al.  Flexible time management in data stream systems , 2004, PODS.

[31]  Yi Huang,et al.  Urban Computing: a challenging problem for Semantic Technologies, , 2008, ASWC 2008.

[32]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[33]  Philip S. Yu,et al.  Loadstar: Load Shedding in Data Stream Mining , 2005, VLDB.

[34]  Rolf Haenni,et al.  Unifying Logical and Probabilistic Reasoning , 2005, ECSQARU.

[35]  Jeff Z. Pan,et al.  Ontological Stream Reasoning via Syntactic Approximation , 2010 .

[36]  Robert H. Halstead,et al.  Parallel Symbolic Computing , 1986, Computer.

[37]  Yanif Ahmad,et al.  Networked Query Processing for Distributed Stream-Based Applications , 2004, VLDB.

[38]  Reza Shojanoori,et al.  Semantic remote patient monitoring system. , 2013, Telemedicine journal and e-health : the official journal of the American Telemedicine Association.

[39]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[40]  Rajeev Motwani,et al.  Load shedding for aggregation queries over data streams , 2004, Proceedings. 20th International Conference on Data Engineering.

[41]  Achim Rettinger,et al.  Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics , 2010, IEEE Intelligent Systems.

[42]  Enrico Motta,et al.  Integration of Semantically Annotated Data by the KnoFuss Architecture , 2008, EKAW.

[43]  Giordano Tamburrelli,et al.  Introducing uncertainty in complex event processing: model, implementation, and validation , 2014, Computing.

[44]  Birgitta König-Ries,et al.  Developing an Infrastructure for Mobile and Wireless Systems , 2001, Lecture Notes in Computer Science.

[45]  Jennifer Widom,et al.  Memory-Limited Execution of Windowed Stream Joins , 2004, VLDB.

[46]  Hans-Arno Jacobsen,et al.  Efficient event processing through reconfigurable hardware for algorithmic trading , 2010, Proc. VLDB Endow..

[47]  Michael Eckert,et al.  Rule-Based Composite Event Queries: The Language XChangeEQ and Its Semantics , 2007, RR.

[48]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[49]  Opher Etzion,et al.  Event Processing in Action , 2010 .

[50]  Dieter Fensel,et al.  It's a Streaming World! Reasoning upon Rapidly Changing Information , 2009, IEEE Intelligent Systems.

[51]  Ying Zhang,et al.  SRBench: A Streaming RDF/SPARQL Benchmark , 2012, SEMWEB.

[52]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[53]  Alessandro Margara,et al.  TESLA: a formally defined event specification language , 2010, DEBS '10.

[54]  Michael Eckert,et al.  Rule-based composite event queries: the language XChangeEQ and its semantics , 2010, Knowledge and Information Systems.

[55]  Manolis Koubarakis,et al.  Modeling and Querying Metadata in the Semantic Sensor Web: The Model stRDF and the Query Language stSPARQL , 2010, ESWC.

[56]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[57]  Emanuele Della Valle,et al.  Social Listening of City Scale Events Using the Streaming Linked Data Framework , 2013, SEMWEB.

[58]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[59]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[60]  Óscar Corcho,et al.  Linked Stream Data: A Position Paper , 2009, SSN.

[61]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[62]  Peter Mika,et al.  Flink: Semantic Web technology for the extraction and analysis of social networks , 2005, J. Web Semant..

[63]  Fabio Grandi,et al.  T-SPARQL: A TSQL2-like Temporal Query Language for RDF , 2010, ADBIS.

[64]  Alessandro Margara,et al.  Low latency complex event processing on parallel hardware , 2012, J. Parallel Distributed Comput..

[65]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[66]  Avigdor Gal,et al.  Complex event processing over uncertain data , 2008, DEBS.

[67]  Martin Gebser,et al.  Stream Reasoning with Answer Set Programming: Preliminary Report , 2012, KR.

[68]  James D. Myers,et al.  Semantic Management of Streaming Data , 2009, SSN.

[69]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[70]  Alessandro Margara,et al.  Complex event processing with T-REX , 2012, J. Syst. Softw..

[71]  Frank van Harmelen,et al.  Towards Expressive Stream Reasoning , 2010, Semantic Challenges in Sensor Networks.

[72]  Herman J. ter Horst,et al.  Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary , 2005, J. Web Semant..

[73]  Rajkumar Buyya,et al.  Content Delivery Networks , 2008 .

[74]  Frederick Reiss,et al.  TelegraphCQ: continuous dataflow processing , 2003, SIGMOD '03.

[75]  Manfred Hauswirth,et al.  StreamRule: A Nonmonotonic Stream Reasoning System for the Semantic Web , 2013, RR.

[76]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[77]  Jeff Z. Pan,et al.  Optimising ontology stream reasoning with truth maintenance system , 2011, CIKM '11.

[78]  Hans-Arno Jacobsen,et al.  Composite Subscriptions in Content-Based Publish/Subscribe Systems , 2005, Middleware.

[79]  Kun-Lung Wu,et al.  Elastic scaling of data parallel operators in stream processing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[80]  Andreas Harth,et al.  SAOR: Authoritative Reasoning for the Web , 2008, ASWC.

[81]  Carlo Zaniolo,et al.  Composite Temporal Events in Active Database Rules: A Logic-Oriented Approach , 1995, DOOD.

[82]  Boris Motik,et al.  Representing and querying validity time in RDF and OWL: A logic-based approach , 2010, J. Web Semant..

[83]  Jörg Hoffmann,et al.  The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Tenerife, Canary Islands, Spain, June 1-5, 2008, Proceedings , 2008, ESWC.

[84]  David Luckham,et al.  The power of events - an introduction to complex event processing in distributed enterprise systems , 2002, RuleML.

[85]  Nick Roussopoulos,et al.  Compressing historical information in sensor networks , 2004, SIGMOD '04.

[86]  Frank van Harmelen,et al.  OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples , 2010, ESWC.

[87]  Jeff Z. Pan,et al.  RDFS Reasoning on Massively Parallel Hardware , 2012, International Semantic Web Conference.

[88]  Pascal Hitzler,et al.  Resolution-Based Approximate Reasoning for OWL DL , 2005, SEMWEB.

[90]  Inderpal Singh Mumick,et al.  The Stanford Data Warehousing Project , 1995 .

[91]  Jacopo Urbani,et al.  Seven Commandments for Benchmarking Semantic Flow Processing Systems , 2013, ESWC.

[92]  Navendu Jain,et al.  Adaptive Control of Extreme-scale Stream Processing Systems , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[93]  Margo I. Seltzer,et al.  Network-Aware Operator Placement for Stream-Processing Systems , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[94]  Jennifer Widom,et al.  Active Database Systems: Triggers and Rules For Advanced Database Processing , 1994 .

[95]  James A. Hendler,et al.  Parallel Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples , 2009, SEMWEB.

[96]  Barend Mons,et al.  Open PHACTS: semantic interoperability for drug discovery. , 2012, Drug discovery today.

[97]  Bertram Ludäscher,et al.  On Active Deductive Databases: The Statelog Approach , 1996, Transactions and Change in Logic Databases.

[98]  Ugur Çetintemel,et al.  Plan-based complex event detection across distributed sources , 2008, Proc. VLDB Endow..

[99]  Michael J. Franklin,et al.  Remembrance of Streams Past: Overload-Sensitive Management of Archived Streams , 2004, VLDB.

[100]  Steffen Staab,et al.  Incrementally Maintaining Materializations of Ontologies Stored in Logic Databases , 2005, J. Data Semant..

[101]  Joseph M. Hellerstein,et al.  BOOM: Data-Centric Programming in the Datacenter , 2009 .

[102]  Frank van Harmelen,et al.  Mind the data skew: distributed inferencing by speeddating in elastic regions , 2010, WWW '10.

[103]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[104]  Amit P. Sheth,et al.  SPARQL-ST: Extending SPARQL to Support Spatiotemporal Queries , 2011, Geospatial Semantics and the Semantic Web.

[105]  Beng Chin Ooi,et al.  Efficient Dynamic Operator Placement in a Locally Distributed Continuous Query System , 2006, OTM Conferences.

[106]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[107]  Sebastian Rudolph,et al.  REAL-TIME COMPLEX EVENT RECOGNITION AND REASONING–A LOGIC PROGRAMMING APPROACH , 2012, Appl. Artif. Intell..

[108]  Yannis E. Ioannidis,et al.  Query optimization , 1996, CSUR.

[109]  Frank van Harmelen,et al.  WebPIE: A Web-scale Parallel Inference Engine using MapReduce , 2012, J. Web Semant..

[110]  Óscar Corcho,et al.  On Correctness in RDF Stream Processor Benchmarking , 2013, International Semantic Web Conference.

[111]  Charles L. Forgy,et al.  Rete: a fast algorithm for the many pattern/many object pattern match problem , 1991 .

[112]  Daniele Braga,et al.  An execution environment for C-SPARQL queries , 2010, EDBT '10.

[113]  Donald Perlis,et al.  Active Logics: A Unified Formal Approach to Episodic Reasoning , 1999 .

[114]  Sebastian Rudolph,et al.  A Rule-Based Language for Complex Event Processing and Reasoning , 2010, RR.

[115]  Karsten Schwan,et al.  Resource-Aware Distributed Stream Management Using Dynamic Overlays , 2005, 25th IEEE International Conference on Distributed Computing Systems (ICDCS'05).

[116]  Sharma Chakravarthy,et al.  Composite Events for Active Databases: Semantics, Contexts and Detection , 1994, VLDB.

[117]  Kun-Lung Wu,et al.  COLA: Optimizing Stream Processing Applications via Graph Partitioning , 2009, Middleware.

[118]  Zhe Wu,et al.  A Scalable Scheme for Bulk Loading Large RDF Graphs into Oracle , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[119]  Daniele Braga,et al.  C-SPARQL: SPARQL for continuous querying , 2009, WWW '09.

[120]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[121]  Val Tannen,et al.  Models for Incomplete and Probabilistic Information , 2006, IEEE Data Eng. Bull..

[122]  Andre Bolles,et al.  Streaming SPARQL - Extending SPARQL to Process Data Streams , 2008, ESWC.

[123]  Opher Etzion,et al.  Amit - the situation manager , 2003, The VLDB Journal.

[124]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[125]  S. Kotoulas,et al.  High-performance Distributed Stream Reasoning using S4 , 2011 .

[126]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[127]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[128]  Zhe Wu,et al.  Optimizing Enterprise-Scale OWL 2 RL Reasoning in a Relational Database System , 2010, SEMWEB.

[129]  Amit P. Sheth,et al.  The SSN ontology of the W3C semantic sensor network incubator group , 2012, J. Web Semant..

[130]  Amit P. Sheth,et al.  Semantic Sensor Web , 2008, IEEE Internet Computing.

[131]  Alexandra Poulovassilis,et al.  Reasoning in Event-Based Distributed Systems , 2011 .

[132]  Claudio Gutiérrez,et al.  Introducing Time into RDF , 2007, IEEE Transactions on Knowledge and Data Engineering.

[133]  Viktor K. Prasanna,et al.  Parallel Inferencing for OWL Knowledge Bases , 2008, 2008 37th International Conference on Parallel Processing.

[134]  Danh Le Phuoc,et al.  A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data , 2011, SEMWEB.

[135]  Alessandro Margara,et al.  Processing flows of information: From data stream to complex event processing , 2012, CSUR.

[136]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.

[137]  Hoan Quoc Nguyen-Mau,et al.  Elastic and Scalable Processing of Linked Stream Data in the Cloud , 2013, SEMWEB.

[138]  B. Mons,et al.  Nano-Publication in the e-science era , 2009 .

[139]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[140]  Xiaohui Gu,et al.  Synergy: Sharing-Aware Component Composition for Distributed Stream Processing Systems , 2006, Middleware.

[141]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[142]  Patrick Doherty,et al.  TAL: Temporal Action Logics Language Specification and Tutorial , 1998, Electron. Trans. Artif. Intell..

[143]  Jonathan Lawry,et al.  Symbolic and Quantitative Approaches to Reasoning with Uncertainty , 2009 .

[144]  Sebastian Speiser,et al.  Semantic Web Technologies for a Smart Energy Grid: Requirements and Challenges , 2010, ISWC Posters&Demos.

[145]  Donald Perlis,et al.  Step-logic: reasoning situated in time , 1988 .

[146]  Jeff Z. Pan,et al.  Optimising Parallel ABox Reasoning of EL Ontologies , 2012, Description Logics.

[147]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[148]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[149]  Robert Stephens,et al.  A survey of stream processing , 1997, Acta Informatica.

[150]  Michael Zink,et al.  Capturing Data Uncertainty in High-Volume Stream Processing , 2009, CIDR.

[151]  Frank van Harmelen,et al.  DynamiTE: Parallel Materialization of Dynamic RDF Data , 2013, SEMWEB.

[152]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[153]  Johannes Gehrke,et al.  What is "next" in event processing? , 2007, PODS.

[154]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[155]  Michael Stonebraker,et al.  Contract-Based Load Management in Federated Distributed Systems , 2004, NSDI.

[156]  Stanley B. Zdonik,et al.  Window-aware load shedding for aggregation queries over data streams , 2006, VLDB.

[157]  Arie Shoshani,et al.  Enabling Real-Time Querying of Live and Historical Stream Data , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[158]  D. Schneider The microsecond market , 2012, IEEE Spectrum.

[159]  Alessandro Margara,et al.  RACED: an adaptive middleware for complex event detection , 2009, ARM '09.

[160]  Mohamed H. Ali,et al.  An introduction to Microsoft SQL server StreamInsight , 2010, COM.Geo '10.

[161]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[162]  Avigdor Gal,et al.  Efficient Processing of Uncertain Events in Rule-Based Systems , 2012, IEEE Transactions on Knowledge and Data Engineering.

[163]  Alessandro Margara,et al.  Deployment strategies for distributed complex event processing , 2012, Computing.

[164]  Carlos A. Hurtado,et al.  Reasoning with Temporal Constraints in RDF , 2006, PPSWR.

[165]  Véronique Malaisé,et al.  Abstracting and reasoning over ship trajectories and web data with the Simple Event Model (SEM) , 2010, Multimedia Tools and Applications.

[166]  Bernardo Cuenca Grau,et al.  History Matters: Incremental Ontology Reasoning Using Modules , 2007, ISWC/ASWC.

[167]  llsoo Ahn,et al.  Temporal Databases , 1986, Computer.

[168]  Daniele Braga,et al.  Continuous Queries and Real-time Analysis of Social Semantic Data with C-SPARQL , 2009 .

[169]  Emal Pasarly Time , 2011, Encyclopedia of Evolutionary Psychological Science.

[170]  Thomas Eiter,et al.  Linked Stream Data Processing Engines: Facts and Figures , 2012, SEMWEB.

[171]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[172]  Sebastian Rudolph,et al.  EP-SPARQL: a unified language for event processing and stream reasoning , 2011, WWW.

[173]  Sebastian Rudolph,et al.  ETALIS: Rule-Based Reasoning in Event Processing , 2011 .

[174]  Onkar B. Walavalkar,et al.  Streaming Knowledge Bases , 2007 .

[175]  José Júlio Alferes,et al.  Principles and Practice of Semantic Web Reasoning , 2004, Lecture Notes in Computer Science.

[176]  Emanuele Della Valle,et al.  On the need to include functional testing in RDF stream engine benchmarks , 2013 .

[177]  Oscar Corcho,et al.  Announcing the birth of the W3C RDF stream processing community group , 2013 .

[178]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[179]  Daniele Braga,et al.  C-SPARQL: a Continuous Query Language for RDF Data Streams , 2010, Int. J. Semantic Comput..

[180]  Carlo Zaniolo,et al.  A data stream language and system designed for power and extensibility , 2006, CIKM '06.

[181]  Kun-Lung Wu,et al.  SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems , 2008, Middleware.