An Efficient Parallel Keyword Search Engine on Knowledge Graphs

Keyword search has recently become popular as a way to query relational databases, and even graphs, since it allows users to issue queries without learning a complex query language and data schema. Evaluating a keyword query is usually significantly more expensive than evaluating an equivalent selection query, since the query specification is less complete, and many alternative answers have to be considered by the system, requiring considerable effort to generate and compare. Current interest in big data and AI are putting even more demands on the efficiency of keyword search. In particular, searching of knowledge graphs is gaining popularity. As knowledge graphs often comprise many millions of nodes and edges, performing real-time search on graphs of this size is an open challenge. In this paper, we attempt to address this need by leveraging advances in hardware technologies, e.g. multi-core CPUs and GPUs. Specifically, we implement a parallel keyword search engine for Knowledge Bases (KB). To be able to do so, and to exploit parallelism, we devise a new approach to keyword search, based on a concept we introduce called Central Graph. Unlike the Group Steiner Tree (GST) model, widely used for keyword search, our approach can naturally work in parallel and still return compact answer graphs with rich information. Our approach can work in either multi-core CPUs or a single GPU. In particular, our GPU implementation is two to three orders of magnitudes faster than state-of-the-art keyword search method. We conduct extensive experiments to show that our approach is both efficient and effective.

[1]  Edmund Ihler,et al.  The Complexity of Approximating the Class Steiner Tree Problem , 1991, WG.

[2]  Yufei Tao,et al.  Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[3]  Jianlong Zhong,et al.  Parallel Graph Processing on Graphics Processors Made Easy , 2013, Proc. VLDB Endow..

[4]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[5]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[6]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[8]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[9]  Ge Yu,et al.  Efficient Keyword Search for SLCA in Parallel XML Databases , 2011, 2011 Eighth Web Information Systems and Applications Conference.

[10]  Bo Ning,et al.  Parallel processing the keyword search in uncertain environment , 2012, 2012 International Conference on System Science and Engineering (ICSSE).

[11]  Martin Berzins,et al.  Parallel Breadth First Search on GPU clusters , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[12]  Jeffrey Xu Yu,et al.  Efficient and Progressive Group Steiner Tree Search , 2016, SIGMOD Conference.

[13]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[14]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[15]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[16]  Kian-Lee Tan,et al.  Real time personalized search on social networks , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[17]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[18]  Feifei Li,et al.  Scalable Keyword Search on Large RDF Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Anthony K. H. Tung,et al.  A Generic Inverted Index Framework for Similarity Search on the GPU , 2016, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[20]  Aijun An,et al.  Keyword Search in Graphs: Finding r-cliques , 2011, Proc. VLDB Endow..

[21]  S. Sudarshan,et al.  BANKS: Browsing and Keyword Searching in Relational Databases , 2002, VLDB.

[22]  Surajit Chaudhuri,et al.  DBXplorer: enabling keyword search over relational databases , 2002, SIGMOD '02.

[23]  Michael Günther,et al.  Introducing Wikidata to the Linked Data Web , 2014, SEMWEB.

[24]  Martin D. F. Wong,et al.  An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.

[25]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[26]  Kamesh Madduri,et al.  Parallel breadth-first search on distributed memory systems , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[27]  Kian-Lee Tan,et al.  Context-aware advertisement recommendation for high-speed social news feeding , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[28]  H. Howie Huang,et al.  iBFS: Concurrent Breadth-First Search on GPUs , 2016, SIGMOD Conference.

[29]  Jeffrey Xu Yu,et al.  Ten thousand SQLs , 2010, Proc. VLDB Endow..

[30]  Thomas Pellissier Tanon,et al.  From Freebase to Wikidata: The Great Migration , 2016, WWW.