Enriching news events with meta-knowledge information

Given the vast amounts of data available in digitised textual form, it is important to provide mechanisms that allow users to extract nuggets of relevant information from the ever growing volumes of potentially important documents. Text mining techniques can help, through their ability to automatically extract relevant event descriptions, which link entities with situations described in the text. However, correct and complete interpretation of these event descriptions is not possible without considering additional contextual information often present within the surrounding text. This information, which we refer to as meta-knowledge, can include (but is not restricted to) the modality, subjectivity, source, polarity and specificity of the event. We have developed a meta-knowledge annotation scheme specifically tailored for news events, which includes six aspects of event interpretation. We have applied this annotation scheme to the ACE 2005 corpus, which contains 599 documents from various written and spoken news sources. We have also identified and annotated the words and phrases evoking the different types of meta-knowledge. Evaluation of the annotated corpus shows high levels of inter-annotator agreement for five meta-knowledge attributes, and moderate level of agreement for the sixth attribute. Detailed analysis of the annotated corpus has revealed further insights into the expression mechanisms of different types of meta-knowledge, their relative frequencies and mutual correlations.

[1]  Xianchuan Wang,et al.  Research and Implementation on Event-Based Method for Automatic Summarization , 2013, BIC-TA.

[2]  Naoaki Okazaki,et al.  Identifying Sections in Scientific Abstracts using Conditional Random Fields , 2008, IJCNLP.

[3]  Zhiyong Lu,et al.  BioCreative III interactive task: an overview , 2011, BMC Bioinformatics.

[4]  Heng Ji,et al.  Refining Event Extraction through Cross-Document Inference , 2008, ACL.

[5]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[6]  Janyce Wiebe,et al.  Tracking Point of View in Narrative , 1994, Comput. Linguistics.

[7]  Jun'ichi Tsujii,et al.  Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases , 2006, ACL.

[8]  Sophia Ananiadou,et al.  Meta-Knowledge Annotation of Bio-Events , 2010, LREC.

[9]  Silvia Pareti,et al.  A Database of Attribution Relations , 2012, LREC.

[10]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[11]  Jeanette K. Gundel,et al.  Cognitive Status and the form of Referring Expressions in Discourse , 1993, The Oxford Handbook of Reference.

[12]  J. Knight Negative results: Null and void , 2003, Nature.

[13]  Steven Skiena,et al.  International Sentiment Analysis for News and Blogs , 2021, ICWSM.

[14]  Sophia Ananiadou,et al.  Boosting automatic event extraction from the literature using domain adaptation and coreference resolution , 2012, Bioinform..

[15]  Sabine Bergler Conveying Attitude with Reported Speech , 2006, Computing Attitude and Affect in Text.

[16]  Sophia Ananiadou,et al.  Enriching a biomedical event corpus with meta-knowledge annotation , 2011, BMC Bioinformatics.

[17]  Razvan C. Bunescu,et al.  Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[18]  E. Prince The ZPG Letter: Subjects, Definiteness, and Information-status , 1992 .

[19]  Irina Prodanof,et al.  Annotating Attribution Relations: Towards an Italian Discourse Treebank , 2010, LREC.

[20]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[21]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[22]  Janyce Wiebe,et al.  A Conceptual Framework for Inferring Implicatures , 2014, WASSA@ACL.

[23]  Ellen Riloff,et al.  Exploiting Subjectivity Classification to Improve Information Extraction , 2005, AAAI.

[24]  Victoria L. Rubin Stating with Certainty or Stating with Doubt: Intercoder Reliability Results for Manual Annotation of Epistemically Modalized Statements , 2007, NAACL.

[25]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[26]  P AUL T HOMPSON,et al.  Towards Event-based Discourse Analysis of Biomedical Text , 2013 .

[27]  Daniel Marcu,et al.  An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[28]  Hagit Shatkay,et al.  New directions in biomedical text annotation: definitions, guidelines and corpus construction , 2006, BMC Bioinformatics.

[29]  Mila Ramos-Santacruz,et al.  REES: A Large-Scale Relation and Event Extraction System , 2000, ANLP.

[30]  Dietrich Rebholz-Schuhmann,et al.  The BioLexicon: a large-scale terminological resource for biomedical text mining , 2011, BMC Bioinformatics.

[31]  Vassiliki Rizomilioti Exploring Epistemic Modality in Academic Discourse Using Corpora , 2006 .

[32]  Sophia Ananiadou,et al.  Comparable Study of Event Extraction in Newswire and Biomedical Domains , 2014, COLING.

[33]  Hong Yu,et al.  The biomedical discourse relation bank , 2011, BMC Bioinformatics.

[34]  Ted Briscoe,et al.  Weakly Supervised Learning for Hedge Classification in Scientific Literature , 2007, ACL.

[35]  Sophia Ananiadou,et al.  Evaluating a meta-knowledge annotation scheme for bio-events , 2010, NeSp-NLP@ACL.

[36]  Sophia Ananiadou,et al.  Extracting semantically enriched events from biomedical literature , 2012, BMC Bioinformatics.

[37]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[38]  Sophia Ananiadou,et al.  Event-based text mining for biology and functional genomics , 2014, Briefings in functional genomics.

[39]  Jeanette K. Gundel,et al.  Cognitive Status and the Form of Referring Expressions in Discourse , 1993 .

[40]  Sophia Ananiadou,et al.  A three-way perspective on scientific discourse annotation for knowledge extraction , 2012, ACL 2012.

[41]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[42]  Bonnie L. Webber,et al.  Discourse structure and language technology , 2011, Natural Language Engineering.

[43]  Padmini Srinivasan,et al.  The Language of Bioscience: Facts, Speculations, and Statements In Between , 2004, HLT-NAACL 2004.

[44]  Claire Cardie,et al.  Multi-Perspective Question Answering Using the OpQA Corpus , 2005, HLT.

[45]  Jari Björne,et al.  TEES 2.1: Automated Annotation Scheme Learning in the BioNLP 2013 Shared Task , 2013, BioNLP@ACL.

[46]  Noriko Kando,et al.  Certainty Identification in Texts: Categorization Model and Manual Tagging Results , 2023 .

[47]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[48]  Sophia Ananiadou,et al.  Identification of Manner in Bio-Events , 2012, LREC.

[49]  Sabine Bergler,et al.  Lexical Structures or Linguistic Inference , 1991, SIGLEX Workshop.

[50]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[51]  Sophia Ananiadou,et al.  Negated bio-events: analysis and identification , 2013, BMC Bioinformatics.

[52]  James Pustejovsky,et al.  Proceedings of the Workshop on Annotating and Reasoning about Time and Events , 2006 .

[53]  R. Morante Vallejo,et al.  Proceedings of the Second Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics , 2015 .

[54]  James Pustejovsky,et al.  FactBank: a corpus annotated with event factuality , 2009, Lang. Resour. Evaluation.

[55]  Sophia Ananiadou,et al.  Meta-Knowledge Annotation at the Event Level: Comparison between Abstracts and Full Papers , 2012, LREC 2012.

[56]  Heng Ji,et al.  A Pairwise Event Coreference Model, Feature Impact and Evaluation for Event Coreference Resolution , 2009 .

[57]  Mark A. Przybocki,et al.  Linguistic Resources and Evaluation Techniques for Evaluation of Cross-Document Automatic Content Extraction , 2008, LREC.

[58]  Claire Cardie,et al.  Recognizing and Organizing Opinions Expressed in the World Press , 2003, New Directions in Question Answering.

[59]  Janyce Wiebe,et al.  Benefactive/Malefactive Event and Writer Attitude Annotation , 2013, ACL.

[60]  Sophia Ananiadou,et al.  News search using discourse analytics , 2013, 2013 Digital Heritage International Congress (DigitalHeritage).

[61]  Miriam Eckert,et al.  The ICWSM 2010 JDPA Sentiment Corpus for the Automotive Domain , 2010 .

[62]  K. Hyland,et al.  Talking to the Academy , 1996 .

[63]  G. Meade Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[64]  Marc Moens,et al.  What's Yours and What's Mine: Determining Intellectual Attribution in Scientific Text , 2000, EMNLP.

[65]  Ann Banfield,et al.  Unspeakable Sentences : Narration and Representation in the Language of Fiction , 1982 .

[66]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[67]  Simon Kerl A comprehensive grammar of the English language , .

[68]  Sophia Ananiadou,et al.  Something Old, Something New: Identifying Knowledge Source in Bio-events , 2013, Int. J. Comput. Linguistics Appl..

[69]  Silvia Pareti The Independent Encoding of Attribution Relations , 2012 .

[70]  Sampo Pyysalo,et al.  Overview of BioNLP Shared Task 2013 , 2013, BioNLP@ACL.

[71]  Tommaso Caselli,et al.  Changeable Polarity of Verbs through Emotions' Attribution in Crowdsourcing Experiments , 2013, ESSEM@AI*IA.

[72]  Alan Lee,et al.  Attribution and its annotation in the Penn Discourse TreeBank , 2006, Trait. Autom. des Langues.

[73]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[74]  Victoria L. Rubin Epistemic modality: From uncertainty to certainty in the context of information seeking as interactions with texts , 2010, Inf. Process. Manag..

[75]  David Ahn,et al.  The stages of event extraction , 2006 .

[76]  Sophia Ananiadou,et al.  Categorising Modality in Biomedical Texts , 2008, LREC 2008.