Named entity recognition goes to old regime France: geographic text analysis for early modern French corpora

ABSTRACT Geographic text analysis (GTA) research in the digital humanities has focused on projects analyzing modern English-language corpora. These projects depend on temporally specific lexicons and gazetteers that enable place name identification and georesolution. Scholars working on the early modern period (1400–1800) lack temporally appropriate geoparsers and gazetteers and have been reliant on general purpose linked open data services like Geonames. These anachronistic resources introduce significant information retrieval and ethical challenges for early modernists. Using the geography entries of the canonical eighteenth-century Encyclopédie, we evaluate rule-based named entity recognition (NER) systems to pinpoint areas where they would benefit from adjustments for processing historical corpora. As we demonstrate, annotating nested and extended place information is one way to improve early modern GTA. Working with Enlightenment sources also motivates a critique of the landscape of digital geospatial data.

[1]  Ian N. Gregory,et al.  Exploring Deep Mapping Concepts: Crosthwaite's Map and West's Picturesque Stations , 2017, COSIT.

[2]  Inderjeet Mani,et al.  SpatialML: annotation scheme, resources, and evaluation , 2010, Lang. Resour. Evaluation.

[3]  Matthew Zook,et al.  Towards a study of information geographies: (im)mutable augmentations and a mapping of the geographies of information , 2015 .

[4]  Nigel Collier,et al.  Which Melbourne? Augmenting Geocoding with Maps , 2018, ACL.

[5]  Bruno Martins,et al.  Using machine learning methods for disambiguating place references in textual documents , 2014, GeoJournal.

[6]  Robert H. McDonald,et al.  Measuring and comparing participation patterns in digital repositories: repositories by the numbers part 1 , 2007 .

[7]  Ian N. Gregory,et al.  Geoparsing, GIS, and Textual Analysis: Current Developments in Spatial Humanities Research , 2015, Int. J. Humanit. Arts Comput..

[8]  Patricia Murrieta-Flores,et al.  Towards the Spatial Analysis of Vague and Imaginary Place and Space: Evolving the Spatial Humanities through Medieval Romance , 2017 .

[9]  DAN EDELSTEIN INTELLECTUAL HISTORY AND DIGITAL HUMANITIES , 2015, Modern Intellectual History.

[10]  Ross Purves,et al.  A quantitative analysis of global gazetteers: Patterns of coverage for common feature types , 2017, Comput. Environ. Urban Syst..

[11]  T. Harris,et al.  The Spatial Humanities: GIS and the Future of Humanities Scholarship , 2010 .

[12]  Benjamin Patai Wing Text-based document geolocation and its application to the digital humanities , 2015 .

[13]  Neil Safier The Tenacious Travels of the Torrid Zone and the Global Dimensions of Geographical Knowledge in the Eighteenth Century , 2014 .

[14]  Ian N. Gregory,et al.  Alts, Abbreviations, and AKAs: Historical Onomastic Variation and Automated Named Entity Recognition , 2017 .

[15]  Benjamin Adams,et al.  Inferring Thematic Places from Spatially Referenced Natural Language Descriptions , 2013 .

[16]  Patricia Murrieta-Flores,et al.  Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora , 2018, Front. Digit. Humanit..

[17]  Patricia Murrieta-Flores,et al.  Literary Mapping in the Digital Age , 2016 .

[18]  Nancy Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[19]  James Frew,et al.  Geographic Names: The Implementation of a Gazetteer in a Georeferenced Digital Library , 1999, D Lib Mag..

[20]  Ludovic Moncla,et al.  Extended Named Entity Recognition Using Finite-State Transducers: An Application To Place Names , 2017 .

[21]  Robert J. Mayhew,et al.  GEOGRAPHY AS THE EYE OF ENLIGHTENMENT HISTORIOGRAPHY , 2010, Modern Intellectual History.

[22]  Ludovic Moncla,et al.  Automated Geoparsing of Paris Street Names in 19th Century Novels , 2017, GeoHumanities@SIGSPATIAL.

[23]  Claire Grover,et al.  A Gazetteer and Georeferencing for Historical English Documents , 2014, LaTeCH@EACL.

[24]  Charles Withers,et al.  Reporting, Mapping, Trusting: Making Geographical Knowledge in the Late Seventeenth Century , 1999, Isis.

[25]  Claire Grover,et al.  Use of the Edinburgh geoparser for georeferencing digitized historical collections , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[26]  Luc De Raedt,et al.  Relational Learning for Spatial Relation Extraction from Natural Language , 2011, ILP.

[27]  Javier Nogueras-Iso,et al.  Geocoding for texts with fine-grain toponyms: an experiment on a geoparsed hiking descriptions corpus , 2014, SIGSPATIAL/GIS.

[28]  Patricia Murrieta-Flores,et al.  Further Frontiers in GIS: Extending Spatial Analysis to Textual Sources in Archaeology , 2015 .

[29]  Paule-Annick Davoine,et al.  Extending TimeML and SpatialML languages to handle imperfect spatio-temporal information in the context of natural hazards studies , 2012 .

[30]  Gideon S. Mann,et al.  Bootstrapping toponym classifiers , 2003, HLT-NAACL 2003.

[31]  Humphrey Southall,et al.  Placing Names: Enriching and Integrating Gazetteers , 2016 .

[32]  Beatrice Alex,et al.  Geoparsing history: Locating commodities in ten million pages of nineteenth-century sources , 2016 .

[33]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[34]  Sophie Rosset,et al.  Tree-Structured Named Entity Recognition on OCR Data: Analysis, Processing and Results , 2012, LREC.

[35]  Katherine McDonough,et al.  Mapping the Encyclopédie: Working Towards an Early Modern Digital Gazetteer , 2017, GeoHumanities@SIGSPATIAL.

[36]  Bruno Martins,et al.  Automated Geocoding of Textual Documents: A Survey of Current Approaches , 2017, Trans. GIS.

[37]  Leif Isaksen,et al.  Linking early geospatial documents, one place at a time: annotation of geographic documents with Recogito , 2015 .

[38]  Nigel Collier,et al.  What’s missing in geographical parsing? , 2017, Language Resources and Evaluation.

[39]  Ian Johnson,et al.  From named place to naming event: creating gazetteers for history , 2008, Int. J. Geogr. Inf. Sci..

[40]  Claire Grover,et al.  Evaluation of georeferencing , 2010, GIR.

[41]  Charles Withers,et al.  Geography: Space, Place and Intellectual History in the Eighteenth Century , 2011 .

[42]  Paul Rayson,et al.  A deeply annotated testbed for geographical text analysis: The Corpus of Lake District Writing , 2017, GeoHumanities@SIGSPATIAL.

[43]  Shilad Sen,et al.  Digital Hegemonies: The Localness of Search Engine Results , 2017 .

[44]  Krzysztof Janowicz,et al.  On the Geo-Indicativeness of Non-Georeferenced Text , 2012, ICWSM.

[45]  Melanie Conroy,et al.  The French Enlightenment Network* , 2016, The Journal of Modern History.

[46]  Charles Withers,et al.  Placing the Enlightenment , 2007 .

[47]  Kate Byrne Nested Named Entity Recognition in Historical Archive Text , 2007, International Conference on Semantic Computing (ICSC 2007).

[48]  Marie-Francine Moens,et al.  Spatial role labeling: Towards extraction of spatial relations from natural language , 2011, TSLP.

[49]  Patrick Manning,et al.  World-Historical Gazetteer , 2015 .

[50]  Beatrice Alex,et al.  Adapting the Edinburgh Geoparser for Historical Georeferencing , 2015, Int. J. Humanit. Arts Comput..

[51]  Ruth Mostern Historical Gazetteers: An Experiential Perspective, with Examples from Chinese History , 2008 .

[52]  Humphrey Southall,et al.  On historical gazetteers , 2011, Int. J. Humanit. Arts Comput..

[53]  Ian N. Gregory,et al.  Customising geoparsing and georeferencing for historical texts , 2013, 2013 IEEE International Conference on Big Data.