22nd International Conference on Computational Linguistics Proceedings of the workshop on Human Judgements in Computational Linguistics

We address the problem of distinguishing between two sources of disagreement in annotations: genuine subjectivity and slip of attention. The latter is especially likely when the classification task has a default class, as in tasks where annotators need to find instances of the phenomenon of interest, such as in a metaphor detection task discussed here. We apply and extend a data analysis technique proposed by Beigman Klebanov and Shamir (2006) to first distill reliably deliberate (non-chance) annotations and then to estimate the amount of attention slips vs genuine disagreement in the reliably deliberate annotations.

[1]  Josef Ruppenhofer,et al.  FrameNet II: Extended theory and practice , 2006 .

[2]  Anna Rumshisky,et al.  Resolving polysemy in verbs: Contextualized distributional approach to argument semantics , 2008 .

[3]  John Sinclair,et al.  Collins COBUILD English Language Dictionary , 1987 .

[4]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[5]  R. Plutchik,et al.  Emotion: Theory, Research, and Experience. Vol. 1. Theories of Emotion , 1981 .

[6]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[7]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[8]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[9]  Andy Way,et al.  Robust language pair-independent sub-tree alignment , 2007, MTSUMMIT.

[10]  Adam Kilgarriff,et al.  The Sketch Engine , 2004 .

[11]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[12]  Martin Volk,et al.  Using the Stockholm TreeAligner , 2007 .

[13]  Christiane Fellbaum,et al.  Building Semantic Concordances , 1998 .

[14]  Magnus Merkel,et al.  Interactive Word Alignment for Language Engineering , 2003, EACL.

[15]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[16]  George Hripcsak,et al.  Measuring agreement in medical informatics reliability studies , 2002, J. Biomed. Informatics.

[17]  Maja Pantic,et al.  Automatic Analysis of Facial Expressions: The State of the Art , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  James Pustejovsky,et al.  Lexical Knowledge Representation and Natural Language Processing , 1993, Artif. Intell..

[19]  Ron Artstein Kappa 3 = Alpha ( or Beta ) , 2005 .

[20]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[21]  Na-Rae Han,et al.  Detecting errors in English article usage by non-native speakers , 2006, Natural Language Engineering.

[22]  Ted Briscoe,et al.  Robust Accurate Statistical Annotation of General Text , 2002, LREC.

[23]  Katrin Erk,et al.  The SALSA Corpus: a German Corpus Resource for Lexical Semantics , 2006, LREC.

[24]  Michael Strube,et al.  Part-of-Speech Tagging of Transcribed Speech , 2006, LREC.

[25]  Martin Chodorow,et al.  The Ups and Downs of Preposition Error Detection in ESL Writing , 2008, COLING.

[26]  Rada Mihalcea,et al.  Word Sense and Subjectivity , 2006, ACL.

[27]  Ho-Won Jung,et al.  Evaluating interrater agreement in SPICE-based assessments , 2003, Comput. Stand. Interfaces.

[28]  Jens Eeg-Olofsson,et al.  Automatic Grammar Checking for Second Language Learners – the Use of Prepositions , 2003 .

[29]  Martin Volk,et al.  Phrase Alignment in Parallel Treebanks , 2006 .

[30]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[31]  Noah A. Smith,et al.  Cairo: An Alignment Visualization Tool , 2000, LREC.

[32]  Janyce Wiebe,et al.  Recognizing subjectivity: a case study in manual tagging , 1999, Natural Language Engineering.

[33]  Takashi Inui,et al.  Extracting Semantic Orientations of Words using Spin Model , 2005, ACL.

[34]  Martin Chodorow,et al.  An Unsupervised Method for Detecting Grammatical Errors , 2000, ANLP.

[35]  Graeme Hirst,et al.  Semantic representations of near-synonyms for automatic lexical choice , 1999 .

[36]  Ivana Kruijff-Korbayová,et al.  Annotation Guidelines for Czech-English Word Alignment , 2006, LREC.

[37]  Hitoshi Isahara,et al.  Automatic Error Detection in the Japanese Learners’ English Spoken Data , 2003, ACL.

[38]  Andy Way,et al.  Robust Sub-Sentential Alignment of Phrase-Structure Trees , 2004, COLING.

[39]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[40]  I. Dan Melamed,et al.  Manual Annotation of Translational Equivalence: The Blinker Project , 1998, ArXiv.

[41]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[42]  A. Ortony,et al.  What's basic about basic emotions? , 1990, Psychological review.

[43]  Gwyneth Doherty-Sneddon,et al.  The Reliability of a Dialogue Structure Coding Scheme , 1997, CL.

[44]  Shlomo Argamon,et al.  Minimizing Manual Annotation Cost in Supervised Training from Corpora , 1996, ACL.

[45]  John Bitchener,et al.  The Effect of Different Types of Corrective Feedback on ESL Student Writing. , 2005 .

[46]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[47]  Na-Rae Han,et al.  Detection of Grammatical Errors Involving Prepositions , 2007, ACL 2007.

[48]  Adam Kilgarriff,et al.  "I Don’t Believe in Word Senses" , 1997, Comput. Humanit..

[49]  D. A. Cruse,et al.  Computational lexical semantics: Polysemy and related phenomena from a cognitive linguistic viewpoint , 1995 .

[50]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[51]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[52]  Mihaela Vela,et al.  Multi-dimensional Annotation and Alignment in an English-German Translation Corpus , 2006, NLPXML@EACL.

[53]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[54]  James Pustejovsky,et al.  Automated Induction of Sense in Context , 2004, COLING.

[55]  Martin Volk,et al.  Alignment Tools for Parallel Treebanks , 2007 .

[56]  James Pustejovsky,et al.  Inducing Sense-Discriminating Context Patterns from Sense-Tagged Corpora , 2006, LREC.

[57]  K. Krippendorff Krippendorff, Klaus, Content Analysis: An Introduction to its Methodology . Beverly Hills, CA: Sage, 1980. , 1980 .

[58]  Hitoshi Isahara,et al.  The Overview of the SST Speech Corpus of Japanese Learner English and Evaluation Through the Experiment on Automatic Detection of Learners' Errors , 2004, LREC.

[59]  Andrew Rosenberg,et al.  Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points , 2004, HLT-NAACL.

[60]  M. A. R T H A P A L,et al.  Making fine-grained and coarse-grained sense distinctions , both manually and automatically , 2005 .

[61]  Mary McGee Wood,et al.  A Categorical Annotation Scheme for Emotion in the Linguistic Content of Dialogue , 2004, ADS.

[62]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[63]  M. Bradley,et al.  Affective Normsfor English Words (ANEW): Stimuli, instruction manual and affective ratings (Tech Report C-1) , 1999 .

[64]  P. Ekman What emotion categories or dimensions can observers judge from facial behavior , 1982 .

[65]  James Pustejovsky,et al.  A Pattern Dictionary for Natural Language Processing , 2005 .

[66]  M. Cole Cross-cultural universals of affective meaning. , 1976 .

[67]  Stephen G. Pulman,et al.  Automatically Acquiring Models of Preposition Use , 2007, ACL 2007.

[68]  Naoki Isu,et al.  A Feedback-Augmented Method for Detecting Errors in the Writing of Learners of English , 2006, ACL.

[69]  Sabine Bergler,et al.  Mining WordNet for a Fuzzy Sentiment: Sentiment Tag Extraction from WordNet Glosses , 2006, EACL.

[70]  James Pustejovsky,et al.  Constructing a Corpus-based Ontology Using Model Bias , 2006, FLAIRS.

[71]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[72]  R. Plutchik A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION , 1980 .

[73]  Chun Chen,et al.  Speech Emotion Recognition and Intensity Estimation , 2004, ICCSA.

[74]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[75]  C. Darwin The Expression of the Emotions in Man and Animals , .

[76]  Jianfeng Gao,et al.  Using Contextual Speller Techniques and Language Modeling for ESL Error Correction , 2008, IJCNLP.

[77]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[78]  Roberto Navigli,et al.  Meaningful Clustering of Senses Helps Boost Word Sense Disambiguation Performance , 2006, ACL.

[79]  Lars Ahrenberg,et al.  LinES: An English-Swedish Parallel Treebank , 2007, NODALIDA.

[80]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[81]  Alex Hagen,et al.  Fuzzy set approach to assessing similarity of categorical maps , 2003, Int. J. Geogr. Inf. Sci..