Ieee Transactions on Knowledge and Data Engineering 1 an Empirical Performance Evaluation of Relational Keyword Search Techniques

Extending the keyword search paradigm to relational data has been an active area of research within the database and IR community during the past decade. Many approaches have been proposed, but despite numerous publications, there remains a severe lack of standardization for the evaluation of proposed search techniques. Lack of standardization has resulted in contradictory results from different evaluations, and the numerous discrepancies muddle what advantages are proffered by different approaches. In this paper, we present the most extensive empirical performance evaluation of relational keyword search techniques to appear to date in the literature. Our results indicate that many existing search techniques do not provide acceptable performance for realistic retrieval tasks. In particular, memory consumption precludes many search techniques from scaling beyond small data sets with tens of thousands of vertices. We also explore the relationship between execution time and factors varied in previous evaluations; our analysis indicates that most of these factors have relatively little impact on performance. In summary, our work confirms previous claims regarding the unacceptable performance of these search techniques and underscores the need for standardization in evaluations--standardization exemplified by the IR community.

[1]  Jeffrey F. Naughton,et al.  Toward scalable keyword search over relational data , 2010, Proc. VLDB Endow..

[2]  Philip S. Yu,et al.  BLINKS: ranked keyword searches on graphs , 2007, SIGMOD '07.

[3]  Jeffrey Xu Yu,et al.  Keyword search in databases: the power of RDBMS , 2009, SIGMOD Conference.

[4]  S. E. Dreyfus,et al.  The steiner problem in graphs , 1971, Networks.

[5]  Gerhard Weikum,et al.  STAR: Steiner-Tree Approximation in Relationship Graphs , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[7]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[8]  Thanh Tran,et al.  Ranking support for keyword search on structured data using relevance models , 2011, CIKM '11.

[9]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Yehoshua Sagiv,et al.  Keyword proximity search in complex data graphs , 2008, SIGMOD Conference.

[11]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[12]  Yehoshua Sagiv,et al.  Language models for keyword search over data graphs , 2012, WSDM '12.

[13]  Wolfgang May Information Extraction and Integration with Florid: The MONDIAL Case Study , 1999 .

[14]  Jeffrey Xu Yu,et al.  Keyword Search in Databases , 2010, Keyword Search in Databases.

[15]  Guoliang Li,et al.  Retune: Retrieving and Materializing Tuple Units for Effective Keyword Search over Relational Databases , 2008, ER.

[16]  N. Fuhr PAN-Uncovering Plagiarism , Authorship , and Social Software Misuse ImageCLEF 2013-Cross Language Image Annotation and Retrieval INEX-INitiative for the Evaluation of XML retrieval , 2002 .

[17]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[18]  Jianyong Wang,et al.  Finding Top-k Answers in Keyword Search over Relational Databases Using Tuple Units , 2011, IEEE Transactions on Knowledge and Data Engineering.

[19]  Jennifer Widom,et al.  Indexing relational database content offline for efficient keyword-based search , 2005, 9th International Database Engineering & Application Symposium (IDEAS'05).

[20]  Yufei Tao,et al.  Querying Communities in Relational Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[21]  Gabriella Kazai INitiative for the Evaluation of XML Retrieval , 2009, Encyclopedia of Database Systems.

[22]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[23]  Xuemin Lin,et al.  SPARK2: Top-k Keyword Query in Relational Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Alfred C. Weaver,et al.  A framework for evaluating database keyword search strategies , 2010, CIKM.

[25]  Xuemin Lin,et al.  Keyword search on structured and semi-structured data , 2009, SIGMOD Conference.

[26]  S. Sudarshan,et al.  Keyword search on external memory data graphs , 2008, Proc. VLDB Endow..

[27]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.

[28]  Divesh Srivastava,et al.  Summarizing Relational Databases , 2009, Proc. VLDB Endow..

[29]  Soumen Chakrabarti,et al.  Keyword Search in Databases , 2007 .

[30]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[31]  Jianyong Wang,et al.  Providing built-in keyword search capabilities in RDBMS , 2011, The VLDB Journal.

[32]  William Webber,et al.  Evaluating the Effectiveness of Keyword Search , 2010, IEEE Data Eng. Bull..

[33]  Alfred C. Weaver,et al.  An Empirical Performance Evaluation of Relational Keyword Search Systems , 2012 .

[34]  Amit Singhal,et al.  AT&T at TREC-7 , 1998, TREC.

[35]  Alfred C. Weaver,et al.  Learning to rank results in relational keyword search , 2011, CIKM '11.

[36]  Cyril Cleverdon,et al.  The Cranfield tests on index language devices , 1997 .

[37]  Alfred C. Weaver,et al.  What Are We Searching For? Analyzing User Objectives When Searching Relational Data , 2012 .

[38]  David Carmel,et al.  Towards expressive exploratory search over entity-relationship data , 2012, WWW.

[39]  Surajit Chaudhuri,et al.  Keyword querying and Ranking in Databases , 2009, Proc. VLDB Endow..

[40]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[41]  Gabriele Reich,et al.  Beyond Steiner's Problem: A VLSI Oriented Generalization , 1989, WG.