Recommending questions using the mdl-based tree cut model

The paper is concerned with the problem of question recommendation. Specifically, given a question as query, we are to retrieve and rank other questions according to their likelihood of being good recommendations of the queried question. A good recommendation provides alternative aspects around users' interest. We tackle the problem of question recommendation in two steps: first represent questions as graphs of topic terms, and then rank recommendations on the basis of the graphs. We formalize both steps as the tree-cutting problems and then employ the MDL (Minimum Description Length) for selecting the best cuts. Experiments have been conducted with the real questions posted at Yahoo! Answers. The questions are about two domains, 'travel' and 'computers & internet'. Experimental results indicate that the use of the MDL-based tree cut model can significantly outperform the baseline methods of word-based VSM or phrase-based VSM. The results also show that the use of the MDL-based tree cut model is essential to our approach.

[1]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[2]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[3]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[4]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[5]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[6]  Kristian J. Hammond,et al.  Question Answering from Frequently Asked Question Files: Experiences with the FAQ FINDER System , 1997, AI Mag..

[7]  Jorma Rissanen,et al.  The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.

[8]  Hang Li,et al.  Generalizing Case Frames Using a Thesaurus and the MDL Principle , 1995, CL.

[9]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[10]  Changning Huang,et al.  A Unified Statistical Model for the Identification of English BaseNP , 2000, ACL.

[11]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[12]  Eriks Sneiders,et al.  Automated Question Answering Using Question Templates That Cover the Conceptual Model of the Database , 2002, NLDB.

[13]  Chung-Hsien Wu,et al.  FAQ Mining via List Detection , 2002, COLING 2002.

[14]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[15]  Hang Li,et al.  Base Noun Phrase Translation Using Web Data and the EM Algorithm , 2002, COLING.

[16]  Satanjeev Banerjee,et al.  The Design, Implementation, and Use of the Ngram Statistics Package , 2003, CICLing.

[17]  Osamu Mizuno,et al.  Query and content suggestion based on latent interest and topic class , 2004, WWW Alt. '04.

[18]  David F. Gleich,et al.  SVD based term suggestion and ranking system , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[19]  Nivio Ziviani,et al.  Discovering Search Engine Related Queries Using Association Rules , 2003, J. Web Eng..

[20]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[21]  W. Bruce Croft,et al.  Finding semantically similar questions based on their answers , 2005, SIGIR '05.

[22]  Benjamin Rey,et al.  Generating query substitutions , 2006, WWW '06.

[23]  Ophir Frieder,et al.  Query Phrase Suggestion from Topically Tagged Session Logs , 2006, FQAS.

[24]  Ryen W. White,et al.  Query suggestion based on user landing pages , 2007, SIGIR.