Summarizing Answers in Non-Factoid Community Question-Answering

We aim at summarizing answers in community question-answering (CQA). While most previous work focuses on factoid question-answering, we focus on the non-factoid question-answering. Unlike factoid CQA, non-factoid question-answering usually requires passages as answers. The shortness, sparsity and diversity of answers form interesting challenges for summarization. To tackle these challenges, we propose a sparse coding-based summarization strategy that includes three core ingredients: short document expansion, sentence vectorization, and a sparse-coding optimization framework. Specifically, we extend each answer in a question-answering thread to a more comprehensive representation via entity linking and sentence ranking strategies. From answers extended in this manner, each sentence is represented as a feature vector trained from a short text convolutional neural network model. We then use these sentence representations to estimate the saliency of candidate sentences via a sparse-coding framework that jointly considers candidate sentences and Wikipedia sentences as reconstruction items. Given the saliency vectors for all candidate sentences, we extract sentences to generate an answer summary based on a maximal marginal relevance algorithm. Experimental results on a benchmark data collection confirm the effectiveness of our proposed method in answer summarization of non-factoid CQA, and moreover, its significant improvement compared to state-of-the-art baselines in terms of ROUGE metrics.

[1]  Eduard Hovy,et al.  A BE-based Multi-document Summarizer with Sentence Compression , 2005 .

[2]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[3]  Lin Ma,et al.  Learning to Answer Questions from Image Using Convolutional Neural Network , 2015, AAAI.

[4]  Michael Gamon,et al.  The PYTHY Summarization System: Microsoft Research at DUC 2007 , 2007 .

[5]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[6]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[7]  Harry Shum,et al.  Twitter Topic Summarization by Ranking Tweets using Social Influence and Content Quality , 2012, COLING.

[8]  M. de Rijke,et al.  Summarizing Contrastive Themes via Hierarchical Non-Parametric Processes , 2015, SIGIR.

[9]  Ming-Wei Chang,et al.  Question Answering Using Enhanced Lexical Semantic Models , 2013, ACL.

[10]  Noah A. Smith,et al.  Extractive Summarization by Maximizing Semantic Volume , 2015, EMNLP.

[11]  Huiping Sun,et al.  CQArank: jointly model topics and expertise in community question answering , 2013, CIKM.

[12]  Zhenhua Wang,et al.  Sumblr: continuous summarization of evolving tweet streams , 2013, SIGIR.

[13]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[14]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[15]  Jeffrey Nichols,et al.  Summarizing sporting events using twitter , 2012, IUI '12.

[16]  Benoît Favre,et al.  Concept-based Summarization using Integer Linear Programming: From Concept Pruning to Multiple Optimal Solutions , 2015, EMNLP.

[17]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[18]  Piji Li,et al.  Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization , 2017, AAAI.

[19]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[20]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[21]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[22]  Hang Li,et al.  Reader-Aware Multi-Document Summarization via Sparse Coding , 2015, IJCAI.

[23]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[24]  Idan Szpektor,et al.  Improving Term Weighting for Community Question Answering Search Using Syntactic Analysis , 2014, CIKM.

[25]  Chris Callison-Burch,et al.  Answer Extraction as Sequence Tagging with Tree Edit Distance , 2013, NAACL.

[26]  M. de Rijke,et al.  Hierarchical multi-label classification of social text streams , 2014, SIGIR.

[27]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[28]  Tao Li,et al.  Learning to Rank for Query-Focused Multi-document Summarization , 2011, 2011 IEEE 11th International Conference on Data Mining.

[29]  Alessandro Moschitti,et al.  Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks , 2015, SIGIR.

[30]  Mengqiu Wang,et al.  A Survey of Answer Extraction Techniques in Factoid Question Answering , 2006 .

[31]  Evangelos Kanoulas,et al.  Dynamic Clustering of Streaming Short Documents , 2016, KDD.

[32]  Idan Szpektor,et al.  Novelty based Ranking of Human Answers for Community Questions , 2016, SIGIR.

[33]  M. de Rijke,et al.  Explainable User Clustering in Short Text Streams , 2016, SIGIR.

[34]  Yue Gao,et al.  Beyond Text QA: Multimedia Answer Generation by Harvesting Web Information , 2013, IEEE Transactions on Multimedia.

[35]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[36]  Yoshua Bengio,et al.  Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus , 2016, ACL.

[37]  Deepayan Chakrabarti,et al.  Event Summarization Using Tweets , 2011, ICWSM.

[38]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[39]  Xueqi Cheng,et al.  A Novel Relational Learning-to-Rank Approach for Topic-Focused Multi-document Summarization , 2013, 2013 IEEE 13th International Conference on Data Mining.

[40]  W. Bruce Croft,et al.  Beyond Factoid QA: Effective Methods for Non-factoid Answer Sentence Retrieval , 2016, ECIR.

[41]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[42]  Wayne H. Ward,et al.  Question Classification with Support Vector Machines and Error Correcting Codes , 2003, HLT-NAACL.

[43]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[44]  Maarten de Rijke,et al.  Feeding the Second Screen: Semantic Linking based on Subtitles , 2013, DIR.

[45]  Lora Aroyo,et al.  Time-aware Multi-Viewpoint Summarization of Multilingual Social Text Streams , 2016, CIKM.

[46]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[47]  Hao Wang,et al.  A Dataset for Research on Short-Text Conversations , 2013, EMNLP.

[48]  Eduard H. Hovy,et al.  From Single to Multi-document Summarization , 2002, ACL.

[49]  Xin Jiang,et al.  Neural Generative Question Answering , 2015, IJCAI.

[50]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[51]  W. Bruce Croft,et al.  Harnessing Semantics for Answer Sentence Retrieval , 2015, ESAIR@CIKM.

[52]  Yang Liu,et al.  Summarizing web forum threads based on a latent topic propagation process , 2011, CIKM '11.

[53]  Minlie Huang,et al.  Metadata-Aware Measures for Answer Summarization in Community Question Answering , 2010, ACL.

[54]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[55]  Liang Zhou,et al.  Summarizing Answers for Complicated Questions , 2006, LREC.

[56]  Chin-Yew Lin,et al.  From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[57]  Hang Li,et al.  Question Classification by Approximating Semantics , 2015, WWW.

[58]  Ee-Peng Lim,et al.  Comments-oriented document summarization: understanding documents with readers' feedback , 2008, SIGIR '08.

[59]  Wilfred Ng,et al.  Expert Finding for Question Answering via Graph Regularized Matrix Completion , 2015, IEEE Transactions on Knowledge and Data Engineering.

[60]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[61]  Xiaohua Hu,et al.  Retrieving Non-Redundant Questions to Summarize a Product Review , 2016, SIGIR.

[62]  W. Bruce Croft,et al.  Evaluating answer passages using summarization measures , 2014, SIGIR.