A learning approach for query planning on spatio-temporal IoT data

The ever-increasing growth of the Internet of Things (IoT) has attracted a considerable amount of research attention from the Semantic Web community in order to address the challenge of poor interoperability. However, our survey of research work has shown that the goal of providing an intelligent processing and analysis engine for IoT has still not been fully achieved. Central to this problem is the requirement for a semantic spatio-temporal query processing engine that is able to not only analyze spatio-temporal correlations in a massive amount of IoT data, but that can also generate an effective query plan for a given query to execute in a timely manner. Needless to say, query planning for the multidimensional data like IoT is a costly operation. The most known techniques are either based on the cost model or by using spatio-temporal data statistics and heuristics. In this paper, we propose an alternative solution that uses query similarity identification in conjunction with machine learning techniques to recommend a previously generated query plan to the optimizer for a given query. Our approach also aims to predict the query execution time for the purposes of workload management and capacity planning. Our extensive experiments indicate the efficiency of our learning approach with an impressive prediction accuracy on test queries.

[1]  Martin F. Arlitt,et al.  Characterizing Web user sessions , 2000, PERV.

[2]  Eli Upfal,et al.  Learning-based Query Performance Modeling and Prediction , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[3]  Hsien-Hsin S. Lee,et al.  Constructing a Non-Linear Model with Neural Networks for Workload Characterization , 2006, 2006 IEEE International Symposium on Workload Characterization.

[4]  Manolis Koubarakis,et al.  Strabon: A Semantic Geospatial DBMS , 2012, SEMWEB.

[5]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[6]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[7]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[8]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[9]  K. Johnson An Update. , 1984, Journal of food protection.

[10]  Fabien L. Gandon,et al.  A Machine Learning Approach to SPARQL Query Performance Prediction , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[11]  Patrick Marcel,et al.  A survey of query recommendation techniques for data warehouse exploration , 2011, EDA.

[12]  Hoan Quoc Nguyen-Mau,et al.  An elastic and scalable spatiotemporal query processing for linked sensor data , 2015, SEMANTICS.

[13]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[14]  Archana Ganapathi,et al.  Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[15]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[16]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[17]  Claudio Gutiérrez,et al.  Introducing Time into RDF , 2007, IEEE Transactions on Knowledge and Data Engineering.

[18]  Liang Yu,et al.  SSTDE: an open source semantic spatiotemporal data engine for sensor web , 2012, SWE '12.

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  Dave Kolas,et al.  Enabling the geospatial Semantic Web with Parliament and GeoSPARQL , 2012, Semantic Web.

[21]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[22]  J. Chase,et al.  Data Center Workload Monitoring , Analysis , and Emulation , 2005 .

[23]  Horst Bunke,et al.  Inexact graph matching for structural pattern recognition , 1983, Pattern Recognit. Lett..

[24]  Kaspar Riesen,et al.  A Novel Software Toolkit for Graph Edit Distance Computation , 2013, GbRPR.

[25]  Amit P. Sheth,et al.  SPARQL-ST: Extending SPARQL to Support Spatiotemporal Queries , 2011, Geospatial Semantics and the Semantic Web.

[26]  Lina Yao,et al.  Learning-Based SPARQL Query Performance Prediction , 2016, WISE.

[27]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[28]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[29]  Manolis Koubarakis,et al.  Modeling and Querying Metadata in the Semantic Sensor Web: The Model stRDF and the Query Language stSPARQL , 2010, ESWC.

[30]  Alexandros Nanopoulos,et al.  Social tagging in recommender systems: a survey of the state-of-the-art and possible extensions , 2010, Artificial Intelligence Review.

[31]  Tim Berners-Lee,et al.  Linked data on the web (LDOW2008) , 2008, WWW.

[32]  Hoan Quoc Nguyen-Mau,et al.  The Graph of Things: A step towards the Live Knowledge Graph of connected things , 2016, J. Web Semant..

[33]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.