Diversifying Search Results through Pattern-Based Subtopic Modeling

Traditional information retrieval models do not necessarily provide users with optimal search experience because the top ranked documents may contain excessively redundant information. Therefore, satisfying search results should be not only relevant to the query but also diversified to cover different subtopics of the query. In this paper, the authors propose a novel pattern-based framework to diversify search results, where each pattern is a set of semantically related terms covering the same subtopic. They first apply a maximal frequent pattern mining algorithm to extract the patterns from retrieval results of the query. The authors then propose to model a subtopic with either a single pattern or a group of similar patterns. A profile-based clustering method is adapted to group similar patterns based on their context information. The search results are then diversified using the extracted subtopics. Experimental results show that the proposed pattern-based methods are effective to diversify the search results.

[1]  K. Ramachandran,et al.  Mathematical Statistics with Applications. , 1992 .

[2]  William Goffman,et al.  A searching procedure for information retrieval , 1964, Inf. Storage Retr..

[3]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[4]  Bert R. Boyce,et al.  Beyond topicality : A two stage view of relevance and the retrieval process , 1982, Inf. Process. Manag..

[5]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[6]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[7]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[8]  A. Sheth International Journal on Semantic Web & Information Systems , .

[9]  Chris P. Tsokos,et al.  Mathematical Statistics with Applications , 2009 .

[10]  Craig MacDonald,et al.  University of Glasgow at TREC 2009: Experiments with Terrier , 2009, TREC.

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Shuming Shi,et al.  Microsoft Research Asia at the Web Track of TREC 2009 , 2009, TREC.

[13]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[14]  Stephen E. Robertson,et al.  Microsoft Research at TREC 2009: Web and Relevance Feedback Track , 2009, TREC.

[15]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[16]  ChengXiang Zhai,et al.  Semantic term matching in axiomatic approaches to information retrieval , 2006, SIGIR.

[17]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[18]  Hui Fang,et al.  A Comparative Study of Search Result Diversification Methods , 2011 .

[19]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  Jiawei Han,et al.  Summarizing itemset patterns: a profile-based approach , 2005, KDD '05.

[22]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[23]  Darko Kirovski,et al.  Essential Pages , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[24]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[25]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track , 2011, TREC.

[26]  Hong Cheng,et al.  Coverage-based search result diversification , 2012, Information Retrieval.

[27]  Charles L. A. Clarke,et al.  Overview of the TREC 2010 Web Track , 2010, TREC.

[28]  Yiqun Liu,et al.  THUIR at TREC 2009 Web Track: Finding Relevant and Diverse Results for Large Scale Web Search , 2009, TREC.

[29]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[30]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[31]  Yue Liu,et al.  ICTNET at Web Track 2010 Diversity Task , 2010, TREC.

[32]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[33]  M. de Rijke,et al.  The University of Amsterdam at TREC 2012 , 2012, TREC.

[34]  G. Meek Mathematical statistics with applications , 1973 .

[35]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[36]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[37]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[38]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[39]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[40]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[41]  M. de Rijke,et al.  The University of Amsterdam at TREC 2008: Blog, Enterprise, and Relevance Feedback , 2008 .

[42]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[43]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[44]  Filip Radlinski,et al.  Redundancy, diversity and interdependent document relevance , 2009, SIGF.