Automatically Mining Facets for Queries from Their Search Results

We address the problem of finding query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query. We assume that the important aspects of a query are usually presented and repeated in the query’s top retrieved documents in the style of lists, and query facets can be mined out by aggregating these significant lists. We propose a systematic solution, which we refer to as QDMiner, to automatically mine query facets by extracting and grouping frequent lists from free text, HTML tags, and repeat regions within top search results. Experimental results show that a large number of lists do exist and useful query facets can be mined by QDMiner. We further analyze the problem of list duplication, and find better query facets can be mined by modeling fine-grained similarities between lists and penalizing the duplicated lists.

[1]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[2]  Jun Rao,et al.  Dynamic faceted search for discovery-driven analysis , 2008, CIKM '08.

[3]  Panagiotis G. Ipeirotis,et al.  Automatic Extraction of Useful Facet Hierarchies from Text Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.

[5]  Peter G. Anick Using terminological feedback for web search refinement: a log-based study , 2003, SIGIR.

[6]  Eugene J. Shekita,et al.  Beyond basic faceted search , 2008, WSDM '08.

[7]  Krisztian Balog,et al.  Entity search: building bridges between two worlds , 2010, SEMSEARCH '10.

[8]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[9]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[10]  James Allan,et al.  Extracting query facets from search results , 2013, SIGIR.

[11]  Lidong Bing,et al.  Web Query Reformulation via Joint Modeling of Latent Topic Dependency and Term Context , 2015, TOIS.

[12]  Gautam Das,et al.  Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia , 2010, WWW '10.

[13]  Wei-Ying Ma,et al.  VIPS: a Vision-based Page Segmentation Algorithm , 2003 .

[14]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[15]  Zhenglu Yang,et al.  QUBiC: An adaptive approach to query-based recommendation , 2013, Journal of Intelligent Information Systems.

[16]  M. de Rijke,et al.  Ranking related entities: components and analyses , 2010, CIKM.

[17]  Kevin Chen-Chuan Chang,et al.  Supporting entity search: a large-scale prototype search engine , 2007, SIGMOD '07.

[18]  Ivan Koychev,et al.  Query-Based Summarization: A survey , 2010 .

[19]  Marti A. Hearst,et al.  Automating Creation of Hierarchical Faceted Metadata Structures , 2007, NAACL.

[20]  Yiqun Liu,et al.  Overview of the NTCIR-11 IMine Task , 2014, NTCIR.

[21]  Shuming Shi,et al.  Employing Topic Models for Pattern-based Semantic Class Discovery , 2009, ACL/IJCNLP.

[22]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[23]  Efthimis N. Efthimiadis,et al.  Analyzing and evaluating query reformulation strategies in web search logs , 2009, CIKM.

[24]  Mohsen Amini Salehi,et al.  A Comprehensive Survey on Text Summarization Systems , 2009, 2009 2nd International Conference on Computer Science and its Applications.

[25]  Enrique Alfonseca,et al.  Generalized syntactic and semantic models of query reformulation , 2010, SIGIR.

[26]  W. Bruce Croft,et al.  Modeling reformulation using query distributions , 2013, TOIS.

[27]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[28]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[29]  Panayiotis Tsaparas,et al.  Facet discovery for structured web search: a query-log mining approach , 2011, SIGMOD '11.

[30]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[31]  James Allan,et al.  Extending Faceted Search to the General Web , 2014, CIKM.

[32]  K. Latha,et al.  AFGF: An Automatic Facet Generation Framework for Document Retrieval , 2010, 2010 International Conference on Advances in Computer Engineering.

[33]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[34]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[35]  Sougata Mukherjea,et al.  Faceted search and browsing of audio content on spoken web , 2010, CIKM.

[36]  Mukesh K. Mohania,et al.  Retrieval]: Query formulation, search process , 2022 .

[37]  Kentaro Torisawa,et al.  A simple WWW-based method for semantic word class acquisition , 2007 .

[38]  Yi Liu,et al.  Translating Queries into Snippets for Improved Query Expansion , 2008, COLING.

[39]  Aristides Gionis,et al.  Improving recommendation for long-tail queries via templates , 2011, WWW.