Shell Miner: Mining Organizational Phrases in Argumentative Texts in Social Media

Threaded debate forums have become one of the major social media platforms. Usually people argue with one another using not only claims and evidences about the topic under discussion but also language used to organize them, which we refer to as shell. In this paper, we study how to separate shell from topical contents using unsupervised methods. Along this line, we develop a latent variable model named Shell Topic Model (STM) to jointly model both topics and shell. Experiments on real online debate data show that our model can find both meaningful shell and topics. The results also show the effectiveness of our model by comparing it with several baselines in shell phrases extraction and document modeling.

[1]  ChengXiang Zhai,et al.  Structural Topic Model for Latent Topical Structure Analysis , 2011, ACL.

[2]  Jian Su,et al.  Exploiting Discourse Analysis for Article-Wide Temporal Classification , 2013, EMNLP.

[3]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[4]  Graeme Hirst,et al.  Classifying arguments by scheme , 2011, ACL.

[5]  Thierry Poibeau,et al.  A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents , 2011, EMNLP.

[6]  Sanda M. Harabagiu,et al.  A generative model for unsupervised discovery of relations and argument classes from clinical texts , 2011, EMNLP.

[7]  Noah A. Smith,et al.  Learning Topics and Positions from Debatepedia , 2013, EMNLP.

[8]  Ivan Titov,et al.  A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations , 2013, ACL.

[9]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[10]  Uzay Kaymak,et al.  Polarity analysis of texts using discourse structure , 2011, CIKM '11.

[11]  Guodong Zhou,et al.  Cross-argument inference for implicit discourse relation recognition , 2012, CIKM '12.

[12]  Guodong Zhou,et al.  A Unified Framework for Discourse Argument Identification via Shallow Semantic Parsing , 2012, COLING.

[13]  Huidong Jin,et al.  Sequential Latent Dirichlet Allocation: Discover Underlying Topic Structures within a Document , 2010, 2010 IEEE International Conference on Data Mining.

[14]  Serena Villata,et al.  Combining Textual Entailment and Argumentation Theory for Supporting Online Debates Interactions , 2012, ACL.

[15]  Liu Yang,et al.  Modeling interaction features for debate side clustering , 2013, CIKM.

[16]  Arjun Mukherjee,et al.  Mining contentions from discussions and debates , 2012, KDD.

[17]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[19]  Nitin Madnani,et al.  Identifying High-Level Organizational Elements in Argumentative Discourse , 2012, NAACL.

[20]  Arjun Mukherjee,et al.  Public Dialogue: Analysis of Tolerance in Online Discussions , 2013, ACL.

[21]  Dragomir R. Radev,et al.  Subgroup Detection in Ideological Discussions , 2012, ACL.

[22]  Ani Nenkova,et al.  Automatic sense prediction for implicit discourse relations in text , 2009, ACL.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Michal Rosen-Zvi,et al.  Hidden Topic Markov Models , 2007, AISTATS.