Enhancing Search with Structure

Keyword search has traditionally focussed on retrieving documents in ranked order, given simple keyword queries. Similarly, work on keyword queries on structured data has focussed on retrieving closely connected pieces of data that together contain given query keywords. In recent years, there has been a good deal of work that attempts to go beyond the above paradigms, to improve search experience on unstructured textual data as well as on structured or semi-structured data. In this paper, we survey recent work on adding structure to keyword search, which can be categorized on three axes: (a) adding structure to unstructured data, (b) adding structure to answers, and (c) adding structure to queries allowing more power than simple keyword queries, but while avoiding the complexity of elaborate query languages that demand extensive schema knowledge.

[1]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[2]  Rahul Gupta,et al.  Answering Table Augmentation Queries from Unstructured Lists on the Web , 2009, Proc. VLDB Endow..

[3]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[4]  H. V. Jagadish,et al.  Qunits: queried units in database search , 2009, CIDR.

[5]  William W. Cohen,et al.  A flexible learning system for wrapping tables and lists in HTML documents , 2002, WWW.

[6]  Fidel Cacheda,et al.  Extracting lists of data records from semi-structured web pages , 2008, Data Knowl. Eng..

[7]  Bing Liu,et al.  Web data extraction based on partial tree alignment , 2005, WWW '05.

[8]  Kevin Chen-Chuan Chang,et al.  EntityRank: Searching Entities Directly and Holistically , 2007, VLDB.

[9]  Michael J. Cafarella Extracting and Querying a Comprehensive Web Database , 2009, CIDR.

[10]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[11]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[12]  Fabian M. Suchanek,et al.  Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .

[13]  Soumen Chakrabarti,et al.  Learning random walks to rank nodes in graphs , 2007, ICML '07.

[14]  Donald Kossmann,et al.  Predicate-based Indexing of Enterprise Web Applications , 2007, CIDR.

[15]  Gerhard Weikum,et al.  NAGA: Searching and Ranking Knowledge , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  Craig A. Knoblock,et al.  Automatic Data Extraction from Lists and Tables in Web Sources , 2001 .

[17]  Soumen Chakrabarti,et al.  User Interaction in the BANKS System. , 2003, ICDE 2003.

[18]  Jayant Madhavan,et al.  Harvesting relational tables from lists on the web , 2009, The VLDB Journal.

[19]  Soumen Chakrabarti,et al.  Dynamic personalized pagerank in entity-relation graphs , 2007, WWW '07.

[20]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[21]  Roy Goldman,et al.  Proximity Search in Databases , 1998, VLDB.

[22]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[23]  Clement T. Yu,et al.  Effective keyword search in relational databases , 2006, SIGMOD Conference.

[24]  Laks V. S. Lakshmanan,et al.  FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.

[25]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[26]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[27]  Maria T. Pazienza,et al.  Information Extraction , 2002, Lecture Notes in Computer Science.

[28]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[29]  Chia-Hui Chang,et al.  IEPAD: information extraction based on pattern discovery , 2001, WWW '01.

[30]  Daisy Zhe Wang,et al.  Uncovering the Relational Web , 2008, WebDB.

[31]  Soumen Chakrabarti,et al.  Learning to rank networked entities , 2006, KDD '06.

[32]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[33]  Beng Chin Ooi,et al.  EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data , 2008, SIGMOD Conference.

[34]  Somnath Banerjee,et al.  Curating and Searching the Annotated Web , 2009 .

[35]  Jeffrey F. Naughton,et al.  Combining keyword search and forms for ad hoc querying of databases , 2009, SIGMOD Conference.

[36]  Soumen Chakrabarti,et al.  Optimizing scoring functions and indexes for proximity search in type-annotated corpora , 2006, WWW '06.

[37]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[38]  Benjamin Van Durme,et al.  The role of documents vs. queries in extracting class attributes from text , 2007, CIKM '07.

[39]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[40]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[41]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[42]  Jeffrey Xu Yu,et al.  Keyword Search in Relational Databases: A Survey , 2010, IEEE Data Eng. Bull..

[43]  Luis Gravano,et al.  Efficient IR-Style Keyword Search over Relational Databases , 2003, VLDB.