Improved answer ranking in social question-answering portals

Community QA portals provide an important resource for non-factoid question-answering. The inherent noisiness of user-generated data makes the identification of high-quality content challenging but all the more important. We present an approach to answer ranking and show the usefulness of features that explicitly model answer quality. Furthermore, we introduce the idea of leveraging snippets of web search results for query expansion in answer ranking. We present an evaluation setup that avoids spurious results reported in earlier work. Our results show the usefulness of our features and query expansion techniques, and point to the importance of regularization when learning from noisy data.

[1]  Hisashi Kawai,et al.  Exploiting Social Q&A Collection in Answering Complex Questions , 2010, CIPS-SIGHAN.

[2]  S. T. Buckland,et al.  Computer-Intensive Methods for Testing Hypotheses. , 1990 .

[3]  Mihai Surdeanu,et al.  Learning to Rank Answers to Non-Factoid Questions from Web Collections , 2011, CL.

[4]  Aravind K. Joshi,et al.  Ranking and Reranking with Perceptron , 2005, Machine Learning.

[5]  I. C. Mogotsi,et al.  Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to information retrieval , 2010, Information Retrieval.

[6]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[7]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[8]  Eric Brill,et al.  Automatic question answering using the web: Beyond the Factoid , 2006, Information Retrieval.

[9]  Lin Sun,et al.  Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities , 2010, ACL.

[10]  Léon Bottou,et al.  Stochastic Learning , 2003, Advanced Lectures on Machine Learning.

[11]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[12]  Sophia Ananiadou,et al.  Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty , 2009, ACL.

[13]  K. J. Evans,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[14]  Samy Bengio,et al.  Links between perceptrons, MLPs and SVMs , 2004, ICML.

[15]  Gunnar Rätsch,et al.  Advanced lectures on machine learning : ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tübingen, Germany, August 4-16, 2003 : revised lectures , 2004 .

[16]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[17]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[18]  Yi Liu,et al.  Translating Queries into Snippets for Improved Query Expansion , 2008, COLING.

[19]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[20]  W. Bruce Croft,et al.  Finding semantically similar questions based on their answers , 2005, SIGIR '05.

[21]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[22]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[23]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[24]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[25]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[26]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[27]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[28]  S. Sathiya Keerthi,et al.  Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.

[29]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[30]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.