Mining User Viewpoints in Online Discussions

Online discussion forums are a type of social media which contains rich usercontributed facts, opinions, and user interactions on diverse topics. The large volume of opinionated data generated in online discussions provides an ideal testbed for user opinion mining. In particular, mining user opinions on social and political issues from online discussions is useful not only to government organizations and companies but also to social and political scientists. In this dissertation, we propose to study the task of mining user viewpoints or stances from online discussions on social and political issues. Specifically, we will talk about our proposed approaches for these sub-tasks, namely, viewpoint discovery, micro-level and macro-level stance prediction, and user viewpoint summarization. We first study how to model user posting behaviors for viewpoint discovery. We have two models for modeling user posting behaviors. Our first model takes three important characteristics of online discussions into consideration: user consistency, topic preference, and user interactions. Our second model focuses on mining interaction features from structured debate posts, and studies how to incorporate such features for viewpoint discovery. Second, we study how to model user opinions for viewpoint discovery. To model user opinions, we leverage the advances in sentiment analysis to extract users opinions in their arguments. Nevertheless, user opinions are sparse in social media and therefore we propose to apply collaborative filtering through matrix factorization to generalize the extracted opinions. Furthermore, we study micro-level and macro-level stance prediction. We propose an integrated model that jointly models arguments, stances, and attributes. Last but not least, we seek to summarize the viewpoints by finding representative posts as one may find the amount of posts holding the same viewpoint is still large. In summary, this dissertation discusses a number of key problems in mining user viewpoints in online discussions and proposes appropriate solutions to these problems. We also discuss other related tasks and point out some future work.

[1]  Eduard H. Hovy,et al.  From Single to Multi-document Summarization , 2002, ACL.

[2]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[3]  Wenpeng Yin,et al.  Generic Multi-Document Summarization Using Topic-Oriented Information , 2012, PRICAI.

[4]  M. Walker,et al.  How can you say such things?!?: Recognizing Disagreement in Informal Political Argument , 2011 .

[5]  Feida Zhu,et al.  Predicting User's Political Party Using Ideological Stances , 2013, SocInfo.

[6]  Chin-Yew Lin,et al.  From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[7]  Michael J. Paul,et al.  A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics , 2010, AAAI.

[8]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[9]  Soo-Min Kim,et al.  Crystal: Analyzing Predictive Opinions on the Web , 2007, EMNLP.

[10]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[11]  Lun-Wei Ku,et al.  Using Polarity Scores of Words for Sentence-level Opinion Extraction , 2007 .

[12]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[13]  Jacob Andreas,et al.  Annotating Agreement and Disagreement in Threaded Discussion , 2012, LREC.

[14]  Yehuda Koren,et al.  Advances in Collaborative Filtering , 2011, Recommender Systems Handbook.

[15]  Dragomir R. Radev,et al.  Subgroup Detection in Ideological Discussions , 2012, ACL.

[16]  Gerhard Weikum,et al.  PolariCQ: polarity classification of political quotations , 2012, CIKM '12.

[17]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[18]  Swapna Somasundaran,et al.  Recognizing Stances in Ideological On-Line Debates , 2010, HLT-NAACL 2010.

[19]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[20]  Noah A. Smith,et al.  Learning Topics and Positions from Debatepedia , 2013, EMNLP.

[21]  Philip Resnik,et al.  More than Words: Syntactic Packaging and Implicit Sentiment , 2009, NAACL.

[22]  Qin Lu,et al.  A Study on Position Information in Document Summarization , 2010, COLING.

[23]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[24]  Martin Ester,et al.  TrustWalker: a random walk model for combining trust-based and item-based recommendation , 2009, KDD.

[25]  Michael J. Paul,et al.  Summarizing Contrastive Viewpoints in Opinionated Text , 2010, EMNLP.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[28]  Benoit Favre,et al.  A Scalable Global Model for Summarization , 2009, ILP 2009.

[29]  Dragomir R. Radev,et al.  DivRank: the interplay of prestige and diversity in information networks , 2010, KDD.

[30]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[31]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[32]  Liu Yang,et al.  Modeling interaction features for debate side clustering , 2013, CIKM.

[33]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[34]  Dragomir R. Radev,et al.  Identifying Text Polarity Using Random Walks , 2010, ACL.

[35]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[36]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[37]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[38]  Iryna Gurevych,et al.  Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations , 2009, TSA@CIKM.

[39]  Jacob Ratkiewicz,et al.  Political Polarization on Twitter , 2011, ICWSM.

[40]  Alexander J. Smola,et al.  Friend or frenemy?: predicting signed ties in social networks , 2012, SIGIR '12.

[41]  Dragomir R. Radev,et al.  Subgroup Detector: A System for Detecting Subgroups in Online Discussions , 2012, ACL.

[42]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[43]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[44]  D. Boyd,et al.  Dynamic Debates: An Analysis of Group Polarization Over Time on Twitter , 2010 .

[45]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[46]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[47]  Arjun Mukherjee,et al.  Modeling Review Comments , 2012, ACL.

[48]  Regina Barzilay,et al.  Automatically Generating Wikipedia Articles: A Structure-Aware Approach , 2009, ACL.

[49]  Vincent Ng,et al.  Vote Prediction on Comments in Social Polls , 2014, EMNLP.

[50]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[51]  Yoshua Bengio,et al.  Deep Learning for NLP (without Magic) , 2012, ACL.

[52]  Alexander J. Smola,et al.  Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS) , 2014, KDD.

[53]  Feida Zhu,et al.  It Is Not Just What We Say, But How We Say Them: LDA-based Behavior-Topic Model , 2013, SDM.

[54]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[55]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[56]  Deepak Agarwal,et al.  Generalizing matrix factorization through flexible regression priors , 2011, RecSys '11.

[57]  Lifu Huang,et al.  Generating Supplementary Travel Guides from Social Media , 2014, COLING.

[58]  Serena Villata,et al.  Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions , 2012, ACL.

[59]  Carolyn Penstein Rosé,et al.  Generalizing Dependency Features for Opinion Mining , 2009, ACL.

[60]  Noah A. Smith,et al.  Modeling User Arguments, Interactions, and Attributes for Stance Prediction in Online Debate Forums , 2015, SDM.

[61]  Isabell M. Welpe,et al.  Election Forecasts With Twitter , 2011 .

[62]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[64]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[65]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[66]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[67]  Hsin-Hsi Chen,et al.  Using Opinion Scores of Words for Sentence-Level Opinion Extraction , 2007, NTCIR.

[68]  A. Tversky,et al.  The framing of decisions and the psychology of choice. , 1981, Science.

[69]  Yaliang Li,et al.  Query-Oriented Keyphrase Extraction , 2012, AIRS.

[70]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[71]  Claire Cardie,et al.  Joint Extraction of Entities and Relations for Opinion Recognition , 2006, EMNLP.

[72]  Xiaojun Wan,et al.  PPSGen: Learning to Generate Presentation Slides for Academic Papers , 2013, IJCAI.

[73]  Swapna Somasundaran,et al.  Recognizing Stances in Online Debates , 2009, ACL.

[74]  Luo Si,et al.  Mining contrastive opinions on political texts using cross-perspective topic model , 2012, WSDM '12.

[75]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[76]  Julia Hirschberg,et al.  Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies , 2004, ACL.

[77]  Michael R. Lyu,et al.  SoRec: social recommendation using probabilistic matrix factorization , 2008, CIKM '08.

[78]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[79]  Andrew McCallum,et al.  Efficient methods for topic model inference on streaming document collections , 2009, KDD.

[80]  Noah A. Smith,et al.  Measuring Ideological Proportions in Political Speeches , 2013, EMNLP.

[81]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[82]  Dragomir R. Radev,et al.  Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants , 2012, EMNLP.

[83]  Martin Ester,et al.  A matrix factorization technique with trust propagation for recommendation in social networks , 2010, RecSys '10.

[84]  Yang Liu,et al.  Using Supervised Bigram-based ILP for Extractive Summarization , 2013, ACL.

[85]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[86]  Wei-Hao Lin,et al.  Which Side are You on? Identifying Perspectives at the Document and Sentence Levels , 2006, CoNLL.

[87]  Xuanjing Huang,et al.  Phrase Dependency Parsing for Opinion Mining , 2009, EMNLP.

[88]  Liu Yang,et al.  Mining User Relations from Online Discussions using Sentiment Analysis and Probabilistic Matrix Factorization , 2013, NAACL.

[89]  Jure Leskovec,et al.  Exploiting Social Network Structure for Person-to-Person Sentiment Analysis , 2014, TACL.

[90]  Weiwei Guo,et al.  Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics , 2012, ACL.

[91]  Deepak Agarwal,et al.  Regression-based latent factor models , 2009, KDD.

[92]  Yue Lu,et al.  Unsupervised discovery of opposing opinion networks from forum discussions , 2012, CIKM '12.

[93]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[94]  Boi Faltings,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Recommendation Using Textual Opinions , 2022 .

[95]  Jing Jiang,et al.  A Latent Variable Model for Viewpoint Discovery from Threaded Forum Posts , 2013, NAACL.

[96]  Jie Tang,et al.  Who will follow you back?: reciprocal relationship prediction , 2011, CIKM '11.

[97]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[98]  Eric P. Xing,et al.  Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective , 2010, EMNLP.

[99]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[100]  Wei-Hao Lin,et al.  A Joint Topic and Perspective Model for Ideological Discourse , 2008, ECML/PKDD.

[101]  Arjun Mukherjee,et al.  Mining contentions from discussions and debates , 2012, KDD.

[102]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.

[103]  Feida Zhu,et al.  An Integrated Model for User Attribute Discovery: A Case Study on Political Affiliation Identification , 2014, PAKDD.