News search using discourse analytics

The vast numbers of digitised documents containing historical data constitute a rich research data repository. However, computational methods and tools available to explore this data are still limited in functionality. Research on historical archives is still largely carried out manually. Text mining technologies offer novel methods to analyse digital content to identify various types of semantic information in these documents and to extract them as semantic metadata. Methods range from the automatic identification of named entities (e.g., people, places, organisations, etc.) to more sophisticated methods to extract information about events (e.g., births, deaths, arrests, etc.), allowing users to greatly increase the specificity of their search. We have created an extended model of event interpretation to allow searches to be refined based on various discourse facets, including isolating definite information about events from more speculative details, distinguishing positive and negative opinions and categorising events according to information source. We present ISHER as an example of a multifaceted, semantically oriented system for searching news articles from the New York Times, dating back to 1987. We explain how our extended event interpretation model can enhance search capabilities in systems such as ISHER, including the identification of contrasting and contradictory information in news articles.

[1]  Fabio Celli,et al.  Improving Relation Extraction with Anaphora in Italian , 2011 .

[2]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[3]  Sophia Ananiadou,et al.  Categorising Modality in Biomedical Texts , 2008, LREC 2008.

[4]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[5]  Sabine Bergler Conveying Attitude with Reported Speech , 2006, Computing Attitude and Affect in Text.

[6]  Sophia Ananiadou,et al.  Extracting semantically enriched events from biomedical literature , 2012, BMC Bioinformatics.

[7]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[8]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[9]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[10]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[11]  Razvan C. Bunescu,et al.  Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[12]  Raymond J. Mooney,et al.  Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction , 2003, J. Mach. Learn. Res..

[13]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[14]  James Pustejovsky,et al.  FactBank: a corpus annotated with event factuality , 2009, Lang. Resour. Evaluation.

[15]  Noriko Kando,et al.  Certainty Identification in Texts: Categorization Model and Manual Tagging Results , 2023 .

[16]  Sabine Bergler,et al.  Lexical Structures or Linguistic Inference , 1991, SIGLEX Workshop.

[17]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[18]  Sophia Ananiadou,et al.  Boosting automatic event extraction from the literature using domain adaptation and coreference resolution , 2012, Bioinform..

[19]  Luís Sarmento,et al.  Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political Debates , 2011, ACL.

[20]  Mark A. Przybocki,et al.  Linguistic Resources and Evaluation Techniques for Evaluation of Cross-Document Automatic Content Extraction , 2008, LREC.

[21]  Bal Krishna Bal Towards an Analysis of Opinions in News Editorials: How positive was the year? (project abstract) , 2009, IWCS.

[22]  Vassiliki Rizomilioti Exploring Epistemic Modality in Academic Discourse Using Corpora , 2006 .

[23]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[24]  Steven Skiena,et al.  International Sentiment Analysis for News and Blogs , 2021, ICWSM.

[25]  K. Hyland,et al.  Talking to the Academy , 1996 .

[26]  Tingting Mu,et al.  ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials , 2012, BMC Medical Informatics and Decision Making.

[27]  Victoria L. Rubin Epistemic modality: From uncertainty to certainty in the context of information seeking as interactions with texts , 2010, Inf. Process. Manag..