Re-using an Argument Corpus to Aid in the Curation of Social Media Collections

This work investigates how automated methods can be used to classify social media text into argumentation types. In particular it is shown how supervised machine learning was used to annotate a Twitter dataset (London Riots) with argumentation classes. An investigation of issues arising from a natural inconsistency within social media data found that machine learning algorithms tend to overfit to the data because Twitter contains a lot of repetition in the form of retweets. It is also noted that when learning argumentation classes we must be aware that the classes will most likely be of very different sizes and this must be kept in mind when analysing the results. Encouraging results were found in adapting a model from one domain of Twitter data (London Riots) to another (OR2012). When adapting a model to another dataset the most useful feature was punctuation. It is probable that the nature of punctuation in Twitter language, the very specific use in links, indicates argumentation class.

[1]  S. Toulmin The uses of argument , 1960 .

[2]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[3]  Jean Carletta,et al.  An annotation scheme for discourse-level argumentation in research articles , 1999, EACL.

[4]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[5]  Chris Reed,et al.  Araucaria: Software for Argument Analysis, Diagramming and Representation , 2004, Int. J. Artif. Intell. Tools.

[6]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[7]  Claire Grover,et al.  Sequence modelling for sentence classification in a legal summarisation system , 2005, SAC '05.

[8]  F. Fischer,et al.  A framework to analyze argumentative knowledge construction in computer-supported collaborative learning , 2006, Comput. Educ..

[9]  Marie-Francine Moens,et al.  Language Resources for Studying Argument , 2008, LREC.

[10]  Chris Reed,et al.  Argumentation Schemes , 2008 .

[11]  Carolyn Penstein Rosé,et al.  Analyzing collaborative learning processes automatically: Exploiting the advances of computational linguistics in computer-supported collaborative learning , 2008, Int. J. Comput. Support. Collab. Learn..

[12]  Raquel Mochales Palau,et al.  Creating an argumentation corpus: do theories apply to real arguments?: a case study on the legal argumentation of the ECHR , 2009, ICAIL.

[13]  Marie-Francine Moens,et al.  Argumentation mining , 2011, Artificial Intelligence and Law.

[14]  Alex Voss,et al.  Riot rumours: how misinformation spread on Twitter during a time of crisis , 2011 .

[15]  Lei Zhang,et al.  LCI: a social channel analysis platform for live customer intelligence , 2011, SIGMOD '11.

[16]  Edward A. Fox,et al.  Social media use by government: From the routine to the critical , 2012, Gov. Inf. Q..

[17]  David M. W. Powers,et al.  The Problem with Kappa , 2012, EACL.

[18]  R. Procter,et al.  Reading the riots on Twitter: methodological innovation for the analysis of big data , 2013 .