Ten thousand SQLs

Keyword search in relational databases has been extensively studied. Given a relational database, a keyword query finds a set of interconnected tuple structures connected by foreign key references. On rdbms, a keyword query is processed in two steps, namely, candidate networks (CNs) generation and CNs evaluation, where a CN is an sql. In common, a keyword query needs to be processed using over 10,000 sqls. There are several approaches to process a keyword query on rdbms, but there is a limit to achieve high performance on a uniprocessor architecture. In this paper, we study parallel computing keyword queries on a multicore architecture. We give three observations on keyword query computing, namely, a large number of sqls that needs to be processed, high sharing possibility among sqls, and large intermediate results with small number of final results. All make it challenging for parallel keyword queries computing. We investigate three approaches. We first study the query level parallelism, where each sql is processed by one core. We distribute the sqls into different cores based on three objectives, regarding minimizing workload skew, minimizing intercore sharing and maximizing intra-core sharing respectively. Such an approach has the potential risk of load unbalancing through accumulating errors of cost estimation. We then study the operation level parallelism, where each operation of an sql is processed by one core. All operations are processed in stages, where in each stage the costs of operations are re-estimated to reduce the accumulated error. Such operation level parallelism still has drawbacks of workload skew when large operations are involved and a large number of cores are used. Finally, we propose a new algorithm that partitions relations adaptively in order to minimize the extra cost of partitioning and at the same time reduce workload skew. We conducted extensive performance studies using two large real datasets, DBLP and IMDB, and we report the efficiency of our approaches in this paper.

[1]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[2]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[3]  Hongjun Lu,et al.  Query Processing in Parallel Relational Database Systems , 1994 .

[4]  Clement T. Yu,et al.  Distributed query processing , 1984, CSUR.

[5]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[6]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[7]  Yeh-Ching Chung,et al.  Improving Static Task Scheduling in Heterogeneous and Homogeneous Computing Systems , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[8]  Wolfgang Lehner,et al.  Efficient exploitation of similar subexpressions for query processing , 2007, SIGMOD '07.

[9]  Yufei Tao,et al.  Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Prasan Roy,et al.  Efficient and extensible algorithms for multi query optimization , 1999, SIGMOD '00.

[11]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[12]  Clement T. Yu,et al.  Priniples of Database Query Processing for Advanced Applications , 1997 .

[13]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[14]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[15]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[16]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Soumen Chakrabarti,et al.  Keyword Search in Databases , 2007 .

[18]  Yin Yang,et al.  Keyword search on relational data streams , 2007, SIGMOD '07.

[19]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[20]  Shirish Tatikonda,et al.  Mining Tree-Structured Data on Multicore Systems , 2009, Proc. VLDB Endow..

[21]  Vagelis Hristidis,et al.  Authority-based keyword search in databases , 2008, TODS.

[22]  Jeffrey Xu Yu,et al.  Keyword Search in Relational Databases: A Survey , 2010, IEEE Data Eng. Bull..

[23]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[24]  S. E. Dreyfus,et al.  The steiner problem in graphs , 1971, Networks.

[25]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[26]  Jeffrey Xu Yu,et al.  Scalable keyword search on large data streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[27]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[28]  Philip S. Yu,et al.  Parallel Query Processing , 1993, Advanced Database Systems.

[29]  Ümit V. Çatalyürek,et al.  Compaction of Schedules and a Two-Stage Approach for Duplication-Based DAG Scheduling , 2009, IEEE Transactions on Parallel and Distributed Systems.

[30]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[31]  Jeffrey Xu Yu,et al.  Keyword search in databases: the power of RDBMS , 2009, SIGMOD Conference.

[32]  Margaret H. Dunham,et al.  Join processing in relational databases , 1992, CSUR.

[33]  Jonghyun Park,et al.  Parallel Skyline Computation on Multicore Architectures , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[34]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[35]  Frank Neven,et al.  Scalable multi-query optimization for exploratory queries over federated scientific databases , 2008, Proc. VLDB Endow..

[36]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.