Are We There Yet?: The Development of a Corpus Annotated for Social Acts in Multilingual Online Discourse

We present the AAWD and AACD corpora, a collection of discussions drawn from Wikipedia talk pages and small group IRC discussions in English, Russian and Mandarin. Our datasets are annotated with labels capturing two kinds of social acts: alignment moves and authority claims. We describe these social acts, describe our annotation process, highlight challenges we encountered and strategies we employed during annotation, and present some analyses of resulting data set which illustrate the utility of our corpus and identify interactions among social acts and between participant status and social acts and in online discourse.

[1]  N. Baym Agreements and Disagreements in a Computer-Mediated Discussion , 1996 .

[2]  János Csirik,et al.  The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts , 2008, BioNLP.

[3]  Jo Mackiewicz,et al.  Assertions of Expertise in Online Product Reviews , 2010 .

[4]  Nianwen Xue,et al.  Annotating Discourse Connectives in the Chinese Treebank , 2005, FCA@ACL.

[5]  Emily M. Bender,et al.  Montage: Leveraging advances in grammar engineering, linguistic ontologies, and mark-up for the documentation of underdescribed languages , 2004 .

[6]  Emily M. Bender,et al.  Computational Linguistics in Support of Linguistic Theory , 2010 .

[7]  Y. Liu Authority, presumption, and invention , 1997 .

[8]  Emily M. Bender,et al.  Semantic Representations of Syntactically Marked Discourse Status in Crosslinguistic Perspective , 2008, STEP.

[9]  K D Bagshawe,et al.  Seminar , 1961, European Business Law Review.

[10]  E. Goffman The Presentation of Self in Everyday Life , 1959 .

[11]  Emily M. Bender Evaluating a Crosslinguistic Grammar Resource: A Case Study of Wambaya , 2008, ACL.

[12]  Philip Ball,et al.  Index aims for fair ranking of scientists , 2005, Nature.

[13]  Emily M. Bender,et al.  Parallel Distributed Grammar Engineering for Practical Applications , 2002, COLING-02 on Grammar engineering and evaluation -.

[14]  M. Alvesson,et al.  Identity regulation as organizational control: Producing the appropriate individual , 2002 .

[15]  Emily M. Bender,et al.  Building a Flexible, Collaborative, Intensive Master’s Program in Computational Linguistics , 2008 .

[16]  Ivan Beschastnikh,et al.  Articulations of wikiwork: uncovering valued work in wikipedia through barnstars , 2008, CSCW.

[17]  Ivan Beschastnikh,et al.  Community, consensus, coercion, control: cs*w or how policy mediates mass participation , 2007, GROUP.

[18]  Bin Zhang,et al.  Detecting Forum Authority Claims in Online Discussions , 2011 .

[19]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[20]  S. Kiesler,et al.  Legitimacy, Authority, and Community in Electronic Support Groups , 1998 .

[21]  E. Goffman,et al.  Forms of talk , 1982 .

[22]  Dorothea K. Thompson Arguing for Experimental “Facts” in Science , 1993 .

[23]  Owen Rambow,et al.  Automatic Detection and Classification of Social Events , 2010, EMNLP.

[24]  Mark Zachry,et al.  "What i know is...": establishing credibility on Wikipedia talk pages , 2010, Int. Sym. Wikis.

[25]  Michael Mulkay Agreement and disagreement in conversations and letters , 1985 .

[26]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[27]  Robert E. Kraut,et al.  Mopping up: modeling wikipedia promotion decisions , 2008, CSCW.

[28]  Alex Lascarides,et al.  Indirect Speech Acts , 2001, Synthese.

[29]  M. Pickering,et al.  The interactive-alignment model: Developments and refinements , 2004, Behavioral and Brain Sciences.

[30]  Clay Shirky Here Comes Everybody: The Power of Organizing Without Organizations , 2008 .

[31]  Junko Mori,et al.  Negotiating Agreement and Disagreement in Japanese: Connective expressions and turn construction , 1999 .

[32]  Antske Fokkens,et al.  Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System , 2010, ACL.

[33]  J. Svennevig Getting acquainted in conversation , 1999 .

[34]  Emily M. Bender,et al.  Validation and Regression Testing for a Cross-linguistic Grammar Resource , 2007, ACL 2007.

[35]  Brian S. Butler,et al.  Don't look now, but we've created a bureaucracy: the nature and roles of policies and rules in wikipedia , 2008, CHI.

[36]  Michael Mulkay,et al.  Conversations and texts , 1986 .

[37]  Eleanor McLellan,et al.  Codebook Development for Team-Based Qualitative Analysis , 1998 .

[38]  Gosse Bouma,et al.  The Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous? , 2009 .

[39]  Meghan Lammie Glenn,et al.  XTrans: a speech annotation and transcription tool , 2009, INTERSPEECH.

[40]  Emily M. Bender,et al.  Beauty and the Beast: What Running a Broad-coverage precision grammar over the BNC taught us about the grammar and the corpus , 2005 .

[41]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[42]  Marie-Claire Shanahan,et al.  Changing the meaning of peer-to-peer? Exploring online comment spaces as sites of negotiated expertise , 2010 .

[43]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[44]  Kay P. Richardson Health risks on the internet: Establishing credibility on line , 2003 .

[45]  J. Jensen Public Spheres on the Internet: Anarchic or Government‐Sponsored – A Comparison , 2003 .

[46]  Penelope Brown,et al.  Politeness: Some Universals in Language Usage , 1989 .

[47]  Martin Wattenberg,et al.  Proceedings of the 40th Hawaii International Conference on System Sciences- 2007 Talk Before You Type: Coordination in Wikipedia , 2022 .

[48]  Jonathan T. Morgan,et al.  Annotating Social Acts: Authority Claims and Alignment Moves in Wikipedia Talk Pages , 2011 .

[49]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[50]  Emily M. Bender,et al.  Argument Optionality in the LinGO Grammar Matrix , 2010, COLING.

[51]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[52]  Emily M. Bender,et al.  Implementing a Syntax-Morphology Interface for Athabaskan , 2004 .

[53]  Emily M. Bender,et al.  Implemented Grammars for the Rest of the World: The challenge of Slave , 2015 .

[54]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[55]  Emily M. Bender,et al.  A coordination module for a crosslinguistic grammar resource , 2005, Proceedings of the International Conference on Head-Driven Phrase Structure Grammar.

[56]  Linda Wine Towards a Deeper Understanding of Framing, Footing, and Alignment , 2008 .

[57]  Timothy Baldwin,et al.  Arboretum: Using a precision grammar for grammar checking in CALL , 2004 .

[58]  Antske Fokkens,et al.  Inflectional morphology in Turkish VP coordination , 2009, Proceedings of the International Conference on Head-Driven Phrase Structure Grammar.

[59]  J. Rees-Miller Power, severity, and context in disagreement , 2000 .

[60]  Emily M. Bender,et al.  Gap-less instrumental relative clauses in English , 2004 .