Slicing and Dicing a Newspaper Corpus for Historical Ecology Research

Historical newspapers are a novel source of information for historical ecologists to study the interactions between humans and animals through time and space. Newspaper archives are particularly interesting to analyse because of their breadth and depth. However, the size and the occasional noisiness of such archives also brings difficulties, as manual analysis is impossible. In this paper, we present experiments and results on automatic query expansion and categorisation for the perception of animal species between 1800 and 1940. For query expansion and to the manual annotation process, we used lexicons. For the categorisation we trained a Support Vector Machine model. Our results indicate that we can distinguish newspaper articles that are about animal species from those that are not with an F\(_{1}\) of 0.92 and the subcategorisation of the different types of newspapers on animals up to 0.84 F\(_{1}\).

[1]  K. van Berkel Voor Heimans en Thijsse. Frederik van Eeden sr. en de natuurbeleving in negentiende-eeuws Nederland , 2006 .

[2]  J. Bosveld,et al.  Historic decline and recent increase of Burbot (Lota lota) in the Netherlands , 2015, Hydrobiologia.

[3]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[4]  Marieke van Erp,et al.  Towards Semantic Enrichment of Newspapers: A Historical Ecology Use Case , 2017, WHiSe@ISWC.

[5]  M. Smit,et al.  Newspaper archives + text mining = rich sources of historical geo-spatial data , 2016 .

[6]  J. Pandolfi,et al.  Nineteenth century narratives reveal historic catch rates for Australian snapper (Pagrus auratus) , 2016 .

[7]  Rinke Hoekstra,et al.  Integrating Diachronous Conceptual Lexicons through Linked Open Data , 2016 .

[8]  A. Cooper,et al.  The Importance of Surprising Results and Best Practices in Historical Ecology , 2015 .

[9]  Klaus U. Schulz,et al.  Towards information retrieval on historical document collections: the role of matching procedures and special lexica , 2010, International Journal on Document Analysis and Recognition (IJDAR).

[10]  R. Kwok Historical data: Hidden in the past , 2017, Nature.

[11]  Karin Dirke Where is the big bad wolf? : Where is the Big Bad Wolf? Notes and Narratives on Wolves in Swedish Newspapers during the 18th and 19th Centuries , 2015 .

[12]  Lisanne Walma Filtering the “News”: Uncovering Morphine's Multiple Meanings on Delpher’s Dutch Newspapers and the Need to Distinguish More Article Types , 2015 .

[13]  W. Balée,et al.  The Research Program of Historical Ecology , 2006 .

[14]  P. Groenewegen,et al.  A Toxic Crisis: Metaphorizing the Financial Crisis , 2015 .

[15]  Bastin Tony Roy Savarimuthu,et al.  Extracting Crime Information from Online Newspaper Articles , 2014, AWC.

[16]  P. Brukner,et al.  Traumatic cricket‐related fatalities in Australia: a historical review of media reports , 2018, The Medical journal of Australia.

[17]  Young-Woo Seo,et al.  Financial News Analysis for Intelligent Portfolio Management , 2004 .

[18]  Reports on badgers Meles meles in Dutch newspapers 1900–2013 : same animals, different framings? , 2015 .