Massive digitization of archival material, coupled with automatic document processing techniques and data visualisation tools offers great opportunities for reconstructing and exploring the past. Unprecedented wealth of historical data (e.g. names of persons, places, transaction records) can indeed be gathered through the transcription and annotation of digitized documents and thereby foster large-scale studies of past societies. Yet, the transformation of hand-written documents into well-represented, structured and connected data is not straightforward and requires several processing steps. In this regard, a key issue is entity record linkage, a process aiming at linking different mentions in texts which refer to the same entity. Also known as entity disambiguation, record linkage is essential in that it allows to identify genuine individuals, to aggregate multi-source information about single entities, and to reconstruct networks across documents and document series. In this paper we present an approach to automatically identify coreferential entity mentions of type Person in a data set derived from Venetian apprenticeship contracts from the early modern period (16th-18th c.). Taking advantage of a manually annotated sub-part of the document series, we compute distances between pairs of mentions, combining various similarity measures based on (sparse) context information and person attributes.
[1]
Julio Gonzalo,et al.
Web people search: results of the first evaluation and the plan for the second
,
2008,
WWW.
[2]
A. Bellavitis.
Apprentissages masculins, apprentissages fminins Venise au XVIe sicle
,
2006
.
[3]
山田 育矢.
Entity linking with a knowledge base(審査報告)
,
2016
.
[4]
David Yarowsky,et al.
Unsupervised Personal Name Disambiguation
,
2003,
CoNLL.
[5]
Jens Lehmann,et al.
DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia
,
2015,
Semantic Web.
[6]
Jiawei Han,et al.
Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions
,
2015,
IEEE Transactions on Knowledge and Data Engineering.
[7]
Heng Ji,et al.
Knowledge Base Population: Successful Approaches and Challenges
,
2011,
ACL.
[8]
William W. Cohen,et al.
A Comparison of String Metrics for Matching Names and Records
,
2003
.
[9]
Peter Christen,et al.
Population Reconstruction
,
2015
.
[10]
Marvin Meeng,et al.
Record Linkage in Medieval and Early Modern Text
,
2015,
Population Reconstruction.