Domain-specific Named Entity Disambiguation in Historical Memoirs

English. This paper presents the results of the extraction of named entities from a collection of historical memoirs about the italian Resistance during the World War II. The methodology followed for the extraction and disambiguation task will be discussed, as well as its evaluation. For the semantic annotations of the dataset, we have developed a pipeline based on established practices for extracting and disambiguating Named Entities. This has been necessary, considering the poor performances of out-of-the-box Named Entity Recognition and Disambiguation (NERD) tools tested in the initial phase of this work. Italiano. Questo articolo presenta l’attività di estrazione di entità nominate realizzata su una collezione di memorie relative al periodo della Resistenza italiana nella Seconda Guerra Mondiale. Verrà discussa la metodologia sviluppata per il processo di estrazione e disambiguazione delle entità nominate, nonché la sua valutazione. L’implementazione di una metodologia di estrazione e disambiguazione basata su lookup si è resa necessaria in considerazione delle scarse prestazioni dei sistemi di Named Entity Recognition and Disambiguation (NERD), come si evince dalla discussione nella prima parte di questo lavoro.

[1]  Frédéric Kaplan,et al.  Diachronic Evaluation of NER Systems on Old Newspapers , 2016, KONVENS.

[2]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[3]  Simonetta Montemagni,et al.  Computational Analysis of Historical Documents : An Application to Italian War Bulletins in World War I and II , 2014 .

[4]  Raphaël Troncy,et al.  NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Extraction Tools , 2012, EACL.

[5]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[6]  Tobias Blanke,et al.  Comparison of named entity recognition tools for raw OCR text , 2012, KONVENS.

[7]  Anna Goy,et al.  Ontologies and historical archives: A way to tell new stories , 2015, Appl. Ontology.

[8]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[9]  Caroline Barrière,et al.  Natural Language Understanding in a Semantic Web Context , 2016, Springer International Publishing.

[10]  Eneko Agirre,et al.  "One Entity per Discourse" and "One Entity per Collocation" Improve Named-Entity Disambiguation , 2014, COLING.

[11]  Simone Paolo Ponzetto,et al.  Enhancing Domain-Specific Entity Linking in DH , 2017, DH.

[12]  Sara Tonelli,et al.  ALCIDE: Extracting and visualising content from large document collections to support humanities studies , 2016, Knowl. Based Syst..

[13]  Francesca Frontini,et al.  REDEN: Named Entity Linking in Digital Literary Editions Using Linked Data Sets , 2016, Complex Syst. Informatics Model. Q..