A Data-Driven Approach to Question Subjectivity Identification in Community Question Answering

Automatic Subjective Question Answering (ASQA), which aims at answering users' subjective questions using summaries of multiple opinions, becomes increasingly important. One challenge of ASQA is that expected answers for subjective questions may not readily exist in the Web. The rising and popularity of Community Question Answering (CQA) sites, which provide platforms for people to post and answer questions, provides an alternative to ASQA. One important task of ASQA is question subjectivity identification, which identifies whether a user is asking a subjective question. Unfortunately, there has been little labeled training data available for this task. In this paper, we propose an approach to collect training data automatically by utilizing social signals in CQA sites without involving any manual labeling. Experimental results show that our data-driven approach achieves 9:37% relative improvement over the supervised approach using manually labeled data, and achieves 5:15% relative gain over a state-of-the-art semi-supervised approach. In addition, we propose several heuristic features for question subjectivity identification. By adding these features, we achieve 11:23% relative improvement over word n-gram feature under the same experimental setting.

[1]  Tetsuya Sakai,et al.  Community QA Question Classification: Is the Asker Looking for Subjective Answers or Not? , 2011 .

[2]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[3]  Yi Liu,et al.  Translating Queries into Snippets for Improved Query Expansion , 2008, COLING.

[4]  Mark Levene,et al.  Search Engines: Information Retrieval in Practice , 2011, Comput. J..

[5]  Sanda M. Harabagiu,et al.  Answering Complex, List and Context Questions with LCC's Question-Answering Server , 2001, TREC.

[6]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[7]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[8]  Michael R. Lyu,et al.  A classification-based approach to question routing in community question answering , 2012, WWW.

[9]  Lada A. Adamic,et al.  Knowledge sharing and yahoo answers: everyone knows something , 2008, WWW.

[10]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .

[11]  Edward Y. Chang,et al.  Confucius and its intelligent disciples , 2010, Proc. VLDB Endow..

[12]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[13]  Mihai Surdeanu,et al.  Learning to Rank Answers on Large Online QA Collections , 2008, ACL.

[14]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[15]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[16]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[17]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[18]  Iryna Gurevych,et al.  Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding , 2009, ACL.

[19]  Young-In Song,et al.  Learning to Suggest Questions in Online Forums , 2011, AAAI.

[20]  Sanghee Oh,et al.  Best-answer selection criteria in a social Q&A site from the user-oriented relevance perspective , 2008, ASIST.

[21]  Michael R. Lyu,et al.  UserRec: A User Recommendation Framework in Social Tagging Systems , 2010, AAAI.

[22]  Kai H. Lim,et al.  Drivers Of Knowledge Contribution Quality And Quantity In Online Question And Answering Communities , 2011, PACIS.

[23]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[24]  Michael R. Lyu,et al.  TagRec: Leveraging Tagging Wisdom for Recommendation , 2009, 2009 International Conference on Computational Science and Engineering.

[25]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[26]  Eric Brill,et al.  Automatic Question Answering: Beyond the Factoid , 2004, NAACL.

[27]  Yong Yu,et al.  Analyzing and Predicting Not-Answered Questions in Community-based Question Answering Services , 2011, AAAI.

[28]  Claire Cardie,et al.  Multi-Perspective Question Answering Using the OpQA Corpus , 2005, HLT.

[29]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[30]  Jimmy J. Lin,et al.  Overview of the TREC 2007 Question Answering Track , 2008, TREC.

[31]  Eugene Agichtein,et al.  Predicting information seeker satisfaction in community question answering , 2008, SIGIR '08.

[32]  Eugene Agichtein,et al.  Exploring question subjectivity prediction in community QA , 2008, SIGIR '08.

[33]  Kai Wang,et al.  Segmentation of multi-sentence questions: towards effective question retrieval in cQA services , 2010, SIGIR '10.

[34]  Eugene Agichtein,et al.  CoCQA: Co-Training over Questions and Answers with an Application to Predicting Question Subjectivity Orientation , 2008, EMNLP.

[35]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..