Multi-metric Graph Query Performance Prediction

We propose a general framework for predicting graph query performance with respect to three performance metrics: execution time, query answer quality, and memory consumption. The learning framework generates and makes use of informative statistics from data and query structure and employs a multi-label regression model to predict the multi-metric query performance. We apply the framework to study two common graph query classes—reachability and graph pattern matching; the two classes differ significantly in their query complexity. For both query classes, we develop suitable performance models and learning algorithms to predict the performance. We demonstrate the efficacy of our framework via experiments on real-world information and social networks. Furthermore, by leveraging the framework, we propose a novel workload optimization algorithm and show that it improves the efficiency of workload management by 54% on average.

[1]  Tianyu Wo,et al.  Capturing Topology in Graph Pattern Matching , 2011, Proc. VLDB Endow..

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Pablo de la Fuente,et al.  An Empirical Study of Real-World SPARQL Queries , 2011, ArXiv.

[4]  Mohammad Hossein Namaki,et al.  BEAMS: Bounded Event Detection in Graph Streams , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[5]  Jeffrey F. Naughton,et al.  Predicting query execution time: Are optimizer cost models really unusable? , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[6]  Lina Yao,et al.  Learning-Based SPARQL Query Performance Prediction , 2016, WISE.

[7]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[8]  Djoerd Hiemstra,et al.  A survey of pre-retrieval query performance predictors , 2008, CIKM '08.

[9]  Mohammad Hossein Namaki,et al.  Performance Prediction for Graph Queries , 2017, NDA@SIGMOD.

[10]  Xuesong Lu,et al.  Sampling Connected Induced Subgraphs Uniformly at Random , 2012, SSDBM.

[11]  Jiaheng Lu,et al.  String similarity measures and joins with synonyms , 2013, SIGMOD '13.

[12]  Mohammad Hossein Namaki,et al.  Event pattern discovery by keywords in graph streams , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[13]  Eli Upfal,et al.  Learning-based Query Performance Modeling and Prediction , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[14]  Fabien L. Gandon,et al.  A Machine Learning Approach to SPARQL Query Performance Prediction , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[15]  Mohammad Hossein Namaki,et al.  Learning to Speed Up Query Planning in Graph Databases , 2017, ICAPS.

[16]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[17]  Mohammad Hossein Namaki,et al.  Discovering Graph Temporal Association Rules , 2017, CIKM.

[18]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[19]  Christopher Hogan,et al.  Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[20]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, ICDE 2011.

[21]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[22]  Ryen W. White,et al.  Predicting query performance using query, result, and user interaction features , 2010, RIAO.

[23]  Yinghui Wu,et al.  Fast top-k search in knowledge graphs , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[24]  Rakebul Hasan,et al.  Predicting SPARQL Query Performance and Explaining Linked Data , 2014, ESWC.

[25]  Yinghui Wu,et al.  Schemaless and Structureless Graph Querying , 2014, Proc. VLDB Endow..

[26]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.