论文信息 - What was the Query? Generating Queries for Document Sets with Applications in Cluster Labeling

What was the Query? Generating Queries for Document Sets with Applications in Cluster Labeling

We deal with the task of generating a query that retrieves a given set of documents. In its abstract form, this can be seen as a “compression” of the document set to a short query. But the task also has a real-world application: cluster labeling (e.g., for faceted search). Our solution to cluster labeling is the usage of queries that approximately retrieve a cluster’s documents. To be generalizable, our approach does not require access to a search index but only a public interface like an API. This way, our approach can also be implemented at client side.

[1] Matthias Hagen,et al. From keywords to keyqueries: content descriptors for the web , 2013, SIGIR.

[2] Flemming Topsøe,et al. Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[3] Hugo Zaragoza,et al. The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[4] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[5] Matthias Hagen,et al. ChatNoir: a search engine for the ClueWeb09 corpus , 2012, SIGIR '12.

[6] Roman Kern,et al. Analysis of structural relationships for hierarchical cluster labeling , 2010, SIGIR '10.

[7] Roberto Navigli,et al. Inducing Word Senses to Improve Web Search Result Clustering , 2010, EMNLP.

[8] Matthias Hagen,et al. Search Strategies for Keyword-based Queries , 2010, 2010 Workshops on Database and Expert Systems Applications.

[9] Candidate Document Retrieval for Web-Scale Text Reuse Detection , 2011, SPIRE.

[10] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[11] Benno Stein,et al. Beyond precision@10: clustering the long tail of web search results , 2011, CIKM '11.

[12] Sergey Yekhanin,et al. Towards 3-query locally decodable codes of subexponential length , 2008, JACM.

[13] Aristides Gionis,et al. Topical query decomposition , 2008, KDD.

[14] Qigang Gao,et al. Using controlled query generation to evaluate blind relevance feedback algorithms , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[15] Fazli Can,et al. A New Approach to Search Result Clustering and Labeling , 2011, AIRS.

[16] Matthew Cooper,et al. Reverted indexing for feedback and expansion , 2010, CIKM.

[17] Leif Azzopardi,et al. Retrievability: an evaluation measure for higher order information access tasks , 2008, CIKM '08.

[18] Benno Stein,et al. Topic Identification: Framework and Application , 2022 .

[19] Norbert FuhrMarc. The optimum clustering framework: implementing the cluster hypothesis , 2012 .

[20] W. Bruce Croft,et al. Finding text reuse on the web , 2009, WSDM '09.

[21] W. Bruce Croft,et al. Evaluating verbose query processing techniques , 2010, SIGIR.

[22] Ali Dasdan,et al. Automatic retrieval of similar content using search engine query interface , 2009, CIKM.

[23] Stephen E. Robertson,et al. Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[24] Ziv Bar-Yossef,et al. Random sampling from a search engine's index , 2006, WWW '06.

[25] Yin Yang,et al. Query by document , 2009, WSDM '09.