Group Non-negative Matrix Factorization with Natural Categories for Question Retrieval in Community Question Answer Archives

Community question answering (CQA) has become an important service due to the popularity of CQA archives on the web. A distinctive feature is that CQA services usually organize questions into a hierarchy of natural categories. In this paper, we focus on the problem of question retrieval and propose a novel approach, called group non-negative matrix factorization with natural categories (GNMFNC). This is achieved by learning the category-specific topics for each category as well as shared topics across all categories via a group non-negative matrix factorization framework. We derive an efficient algorithm for learning the factorization, analyze its complexity, and provide proof of convergence. Experiments are carried out on a real world CQA data set from Yahoo! Answers. The results show that our proposed approach significantly outperforms various baseline methods and achieves the state-of-the-art performance for question retrieval.

[1]  Jung-Tae Lee,et al.  Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models , 2008, EMNLP.

[2]  Fei Xu,et al.  A Category-integrated Language Model for Question Retrieval in Community Question Answering , 2012, AIRS.

[3]  Hyunsoo Kim,et al.  Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method , 2008, SIAM J. Matrix Anal. Appl..

[4]  Christian S. Jensen,et al.  A generalized framework of exploring category information for question retrieval in community question answer archives , 2010, WWW '10.

[5]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[6]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[7]  Iryna Gurevych,et al.  Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding , 2009, ACL.

[8]  Seungjin Choi,et al.  Group Nonnegative Matrix Factorization for EEG Classification , 2009, AISTATS.

[9]  Christian S. Jensen,et al.  The use of categorization information in language models for question retrieval , 2009, CIKM.

[10]  C. D. Meyer,et al.  Initializations for the Nonnegative Matrix Factorization , 2006 .

[11]  Yong Yu,et al.  Searching Questions by Identifying Question Topic and Question Focus , 2008, ACL.

[12]  Li Cai,et al.  Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives , 2011, ACL.

[13]  Amy Nicole Langville,et al.  Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization , 2014, ArXiv.

[14]  Amit Singh Entity based Q&A Retrieval , 2012, EMNLP-CoNLL.

[15]  Quan Wang,et al.  Group matrix factorization for scalable topic modeling , 2012, SIGIR '12.

[16]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[17]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[18]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[19]  Fang Liu,et al.  Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization , 2013, ACL.

[20]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[21]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[22]  Christos Boutsidis,et al.  SVD based initialization: A head start for nonnegative matrix factorization , 2008, Pattern Recognit..

[23]  Tat-Seng Chua,et al.  Exploring domain-specific term weight in archived question search , 2010, CIKM.

[24]  Quoc V. Le,et al.  Learning to Rank with Nonsmooth Cost Functions , 2006, Neural Information Processing Systems.

[25]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[26]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[27]  Jun Zhao,et al.  Towards faster and better retrieval models for question search , 2013, CIKM.

[28]  Li Cai,et al.  Learning the Latent Topics for Question Retrieval in Community QA , 2011, IJCNLP.

[29]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[30]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.