Supporting Search-As-You-Type Using SQL in Databases

A search-as-you-type system computes answers on-the-fly as a user types in a keyword query character by character. We study how to support search-as-you-type on data residing in a relational DBMS. We focus on how to support this type of search using the native database language, SQL. A main challenge is how to leverage existing database functionalities to meet the high-performance requirement to achieve an interactive speed. We study how to use auxiliary indexes stored as tables to increase search performance. We present solutions for both single-keyword queries and multikeyword queries, and develop novel techniques for fuzzy search using SQL by allowing mismatches between query keywords and answers. We present techniques to answer first-N queries and discuss how to support updates efficiently. Experiments on large, real data sets show that our techniques enable DBMS systems on a commodity computer to support search-as-you-type on tables with millions of records.

[1]  Divesh Srivastava,et al.  Fast Indexes and Algorithms for Set Similarity Selection Queries , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Chengqi Zhang,et al.  Efficient approximate entity extraction with edit distance constraints , 2009, SIGMOD Conference.

[3]  Guoliang Li,et al.  Efficient interactive fuzzy keyword search , 2009, WWW '09.

[4]  Feifei Li,et al.  Probabilistic string similarity joins , 2010, SIGMOD Conference.

[5]  Hao Wu,et al.  DBease: Making Databases User-Friendly and Easily Accessible , 2011, CIDR.

[6]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[7]  Bin Wang,et al.  VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams , 2007, VLDB.

[8]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[9]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[10]  Ingmar Weber,et al.  The CompleteSearch Engine: Interactive, Efficient, and Towards IR& DB Integration , 2007, CIDR.

[11]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Jae-Gil Lee,et al.  n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure , 2005, VLDB.

[13]  H. V. Jagadish,et al.  Effective Phrase Prediction , 2007, VLDB.

[14]  Jeffrey Xu Yu,et al.  Ten thousand SQLs , 2010, Proc. VLDB Endow..

[15]  Ingmar Weber,et al.  Type less, find more: fast autocompletion search with a succinct index , 2006, SIGIR.

[16]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Jianyong Wang,et al.  Providing built-in keyword search capabilities in RDBMS , 2011, The VLDB Journal.

[18]  Marianne Winslett,et al.  Trustworthy keyword search for compliance storage , 2008, The VLDB Journal.

[19]  Robert B. Miller,et al.  Response time in man-computer conversational transactions , 1899, AFIPS Fall Joint Computing Conference.

[20]  Surajit Chaudhuri,et al.  A Primitive Operator for Similarity Joins in Data Cleaning , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Surajit Chaudhuri,et al.  Scalable ad-hoc entity extraction from text collections , 2008, Proc. VLDB Endow..

[22]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[23]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[24]  Surajit Chaudhuri,et al.  Extending autocompletion to tolerate errors , 2009, SIGMOD Conference.

[25]  Xuemin Lin,et al.  Top-k Set Similarity Joins , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[26]  Anthony K. H. Tung,et al.  Relaxing join and selection queries , 2006, VLDB.

[27]  Xuemin Lin,et al.  Ed-Join: an efficient algorithm for similarity joins with edit distance constraints , 2008, Proc. VLDB Endow..

[28]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.

[29]  Esko Ukkonen,et al.  Finding Approximate Patterns in Strings , 1985, J. Algorithms.

[30]  Guoliang Li,et al.  Efficient type-ahead search on relational data: a TASTIER approach , 2009, SIGMOD Conference.

[31]  Fabian M. Suchanek,et al.  ESTER: efficient search on text, entities, and relations , 2007, SIGIR.

[32]  Kyuseok Shim,et al.  Power-Law Based Estimation of Set Similarity Join Size , 2009, Proc. VLDB Endow..

[33]  Xiaohui Yu,et al.  Hashed samples: selectivity estimators for set similarity selection queries , 2008, Proc. VLDB Endow..

[34]  Jeffrey Xu Yu,et al.  Keyword search in databases: the power of RDBMS , 2009, SIGMOD Conference.

[35]  Raghav Kaushik,et al.  Efficient exact set-similarity joins , 2006, VLDB.

[36]  Roberto J. Bayardo,et al.  Scaling up all pairs similarity search , 2007, WWW '07.

[37]  Surajit Chaudhuri,et al.  An efficient filter for approximate membership checking , 2008, SIGMOD Conference.

[38]  Luis Gravano,et al.  Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.

[39]  Surajit Chaudhuri,et al.  Data cleaning in microsoft SQL server 2005 , 2005, SIGMOD '05.

[40]  Jiaheng Lu,et al.  Efficient Merging and Filtering Algorithms for Approximate String Searches , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[41]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[42]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[43]  Sunita Sarawagi,et al.  Efficient set joins on similarity predicates , 2004, SIGMOD '04.

[44]  Guoliang Li,et al.  Trie-join , 2010, Proc. VLDB Endow..

[45]  Divesh Srivastava,et al.  Incremental maintenance of length normalized indexes for approximate string matching , 2009, SIGMOD Conference.

[46]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near-duplicate detection , 2011, TODS.

[47]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[48]  Rajeev Motwani,et al.  Robust identification of fuzzy duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).

[49]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[50]  Kyuseok Shim,et al.  Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance , 2007, VLDB.