Keyword search on structured and semi-structured data

Empowering users to access databases using simple keywords can relieve the users from the steep learning curve of mastering a structured query language and understanding complex and possibly fast evolving data schemas. In this tutorial, we give an overview of the state-of-the-art techniques for supporting keyword search on structured and semi-structured data, including query result definition, ranking functions, result generation and top-k query processing, snippet generation, result clustering, query cleaning, performance optimization, and search quality evaluation. Various data models will be discussed, including relational data, XML data, graph-structured data, data streams, and workflows. We also discuss applications that are built upon keyword search, such as keyword based database selection, query generation, and analytical processing. Finally we identify the challenges and opportunities of future research to advance the field.

[1]  Jun Zhang,et al.  NUITS: a novel user interface for efficient keyword search over databases , 2006, VLDB.

[2]  Jianyong Wang,et al.  Progressive Keyword Search in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[3]  Yi Chen,et al.  WISE: A Workflow Information Search Engine , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[4]  Bin Liu,et al.  A Spreadsheet Algebra for a Direct Data Manipulation Query Interface , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[5]  S. Ramey,et al.  Acknowledgement , 2000, NeuroImage.

[6]  Yufei Tao,et al.  Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7]  Yufei Tao,et al.  Finding frequent co-occurring terms in relational keyword search , 2009, EDBT '09.

[8]  Gabriella Kazai INitiative for the Evaluation of XML Retrieval , 2009, Encyclopedia of Database Systems.

[9]  Yi Chen,et al.  Answering Keyword Queries on XML Using Materialized Views , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Ziyang Liu,et al.  Query biased snippet generation in XML search , 2008, SIGMOD Conference.

[11]  Yi Chen,et al.  eXtract: a snippet generation system for XML search , 2008, Proc. VLDB Endow..

[12]  Jennifer Widom,et al.  Indexing relational database content offline for efficient keyword-based search , 2005, 9th International Database Engineering & Application Symposium (IDEAS'05).

[13]  Yin Yang,et al.  Reachability Indexes for Relational Keyword Search , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[14]  Chee Yong Chan,et al.  Multiway SLCA-based keyword search in XML data , 2007, WWW '07.

[15]  Cong Yu,et al.  Schema-Free XQuery , 2004, VLDB.

[16]  Sihem Amer-Yahia,et al.  XML Full-Text Search: Challenges and Opportunities , 2005, VLDB.

[17]  Gabriella Kazai Initiative for the Evaluation of XML Retrieval , 2009 .

[18]  Luis Gravano,et al.  Efficient Keyword Search Across Heterogeneous Relational Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[20]  Yannis Papakonstantinou,et al.  Efficient keyword search for smallest LCAs in XML databases , 2005, SIGMOD '05.

[21]  Anthony K. H. Tung,et al.  Keyword Search in Spatial Databases: Towards Searching by Document , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[22]  Sigmod Acm Sigart,et al.  Proceedings of the First International Workshop on Keyword Search on Structured Data , 2009 .

[23]  Anthony K. H. Tung,et al.  Effective keyword-based selection of relational databases , 2007, SIGMOD '07.

[24]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[25]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[26]  Berthold Reinwald,et al.  Towards keyword-driven analytical processing , 2007, SIGMOD '07.

[27]  Surajit Chaudhuri Databases and IR: Perspectives of a SQL Guy , 2003 .

[28]  Surajit Chaudhuri,et al.  Extending autocompletion to tolerate errors , 2009, SIGMOD Conference.

[29]  H. V. Jagadish,et al.  Qunits: queried units in database search , 2009, CIDR.

[30]  Xuemin Lin,et al.  SPARK: A Keyword Search Engine on Relational Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[31]  Yannis Papakonstantinou,et al.  Efficient LCA based keyword search in xml data , 2007, CIKM '07.

[32]  Jeffrey Xu Yu,et al.  Keyword Search in Relational Databases: A Survey , 2010, IEEE Data Eng. Bull..

[33]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[34]  Yi Chen,et al.  Reasoning and identifying relevant matches for XML keyword search , 2008, Proc. VLDB Endow..

[35]  Georgia Koutrika,et al.  Précis: from unstructured keywords as queries to structured databases as answers , 2007, The VLDB Journal.

[36]  Haofen Wang,et al.  Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[37]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[38]  Jian Pei,et al.  Answering aggregate keyword queries on relational databases using minimal group-bys , 2009, EDBT '09.

[39]  K. Pu,et al.  Keyword query cleaning , 2008, Proc. VLDB Endow..

[40]  Fan Yang,et al.  Efficient keyword search over virtual XML views , 2008, The VLDB Journal.

[41]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[42]  Georgia Koutrika,et al.  Data clouds: summarizing keyword search results over structured data , 2009, EDBT '09.

[43]  Yi Chen,et al.  Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[44]  Tok Wang Ling,et al.  Effective XML Keyword Search with Relevance Oriented Ranking , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[45]  Jeffrey Xu Yu,et al.  Keyword search in databases: the power of RDBMS , 2009, SIGMOD Conference.

[46]  Sandeep Tata,et al.  SQAK: doing more with keywords , 2008, SIGMOD Conference.

[47]  Jianmin Wang,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2011, IEEE Trans. Knowl. Data Eng..

[48]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[49]  Divesh Srivastava,et al.  Keyword proximity search in XML trees , 2006, IEEE Transactions on Knowledge and Data Engineering.

[50]  Vagelis Hristidis,et al.  Authority-based keyword search in databases , 2008, TODS.

[51]  Yi Chen,et al.  XSeek: A Semantic XML Search Engine Using Keywords , 2007, VLDB.

[52]  Vagelis Hristidis,et al.  Keyword proximity search on XML graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[53]  Jianyong Wang,et al.  An effective and versatile keyword search engine on heterogenous data sources , 2008, Proc. VLDB Endow..

[54]  David Maier,et al.  Principles of dataspace systems , 2006, PODS '06.

[55]  Cong Yu,et al.  Querying structured text in an XML database , 2003, SIGMOD '03.

[56]  Koby Crammer,et al.  Learning to create data-integrating queries , 2008, Proc. VLDB Endow..

[57]  Gerhard Weikum DB&IR: both sides now , 2007, SIGMOD '07.

[58]  Georgia Koutrika,et al.  Précis: The Essence of a Query Answer , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[59]  Magesh Jayapandian,et al.  Automated creation of a forms-based database query interface , 2008, Proc. VLDB Endow..

[60]  Lin Guo,et al.  Topology Search over Biological Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[61]  Naphtali Rishe,et al.  Keyword Search on Spatial Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[62]  Adriane Chapman,et al.  Making database systems usable , 2007, SIGMOD '07.

[63]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[64]  Guoliang Li,et al.  Efficient type-ahead search on relational data: a TASTIER approach , 2009, SIGMOD Conference.

[65]  Yehoshua Sagiv,et al.  Finding and approximating top-k answers in keyword proximity search , 2006, PODS '06.

[66]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[67]  Roy Goldman,et al.  Proximity Search in Databases , 1998, VLDB.

[68]  Rémi Gilleron,et al.  Retrieving meaningful relaxed tightest fragments for XML keyword search , 2009, EDBT '09.

[69]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[70]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[71]  Divyakant Agrawal,et al.  Retrieving and organizing web pages by “information unit” , 2001, WWW '01.

[72]  Anthony K. H. Tung,et al.  A graph method for keyword-based selection of the top-K databases , 2008, SIGMOD Conference.

[73]  H. V. Jagadish,et al.  DaNaLIX: a domain-adaptive natural language interface for querying XML , 2007, SIGMOD '07.

[74]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[75]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.