Named Entity Recognition and Classification on Historical Documents: A Survey

After decades of massive digitisation, an unprecedented amount of historical documents is available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it also opens up new opportunities in terms of content mining and the next fundamental challenge is to develop appropriate technologies to efficiently search, retrieve and explore information from this ‘big data of the past’. Among semantic indexing opportunities, the recognition and classification of named entities are in great demand among humanities scholars. Yet, named entity recognition (NER) systems are heavily challenged with diverse, historical and noisy inputs. In this survey, we present the array of challenges posed by historical documents to NER, inventory existing resources, describe the main approaches deployed so far, and identify key priorities for future developments.

[1]  Hua Xu,et al.  Research and applications: A comprehensive study of named entity recognition in Chinese clinical text , 2014, J. Am. Medical Informatics Assoc..

[2]  Emanuele Pianta,et al.  The TextPro Tool Suite , 2008, LREC.

[3]  Xavier Tannier,et al.  Named Entity Recognition Applied on a Data Base of Medieval Latin Charters. The Case of Chartae Burgundiae , 2016, HistoInformatics@DH.

[4]  Thierry Paquet,et al.  A Named Entity Extraction System for Historical Financial Data , 2020, DAS.

[5]  Asif Ekbal,et al.  Assessing the Challenge of Fine-Grained Named Entity Recognition and Classification , 2010, NEWS@ACL.

[6]  Rachele Sprugnoli Arretium or Arezzo? A Neural Approach to the Identification of Place Names in Historical Texts , 2018, CLiC-it.

[7]  Marc B. Vilain,et al.  Entity Extraction is a Boring Solved Problem - Or is it? , 2007, HLT-NAACL.

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Laurent Romary,et al.  CamemBERT: a Tasty French Language Model , 2019, ACL.

[10]  Giovanni Colavizza,et al.  Neural Language Models for Nineteenth-Century English , 2021, Journal of Open Humanities Data.

[11]  B. V. Pawar,et al.  Survey of Named Entity Recognition Systems with respect to Indian and Foreign Languages , 2016 .

[12]  Bruno Pouliquen,et al.  An introduction to the Europe Media Monitor family of applications , 2013, ArXiv.

[13]  Mark Depauw,et al.  Developing Onomastic Gazetteers and Prosopographies for the Ancient World Through Named Entity Recognition and Graph Visualization: Some Examples from Trismegistos People , 2014, SocInfo Workshops.

[14]  Diego Mollá Aliod,et al.  Named Entity Recognition for Question Answering , 2006, ALTA.

[15]  Roland Vollgraf,et al.  FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP , 2019, NAACL.

[16]  Transfer Learning for Named Entity Recognition in Historical Corpora , 2020, CLEF.

[17]  Thomas Wolf,et al.  Transfer Learning in Natural Language Processing , 2019, NAACL.

[18]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[19]  Gregory R. Crane,et al.  The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[20]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[21]  Fredric C. Gey Research to Improve Cross-Language Retrieval - Position Paper for CLEF , 2000, CLEF.

[22]  Josef Steinberger,et al.  Multilingual Statistical News Summarization , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[23]  Stefan Schweter,et al.  Triple E - Effective Ensembling of Embeddings and Language Models for NER of Historical German , 2020, CLEF.

[24]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[25]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[26]  Barbara Plank,et al.  Neural Unsupervised Domain Adaptation in NLP—A Survey , 2020, COLING.

[27]  Sara Tonelli,et al.  Prendo la Parola in Questo Consesso Mondiale: A Multi-Genre 20th Century Corpus in the Political Domain , 2019, CLiC-it.

[28]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[29]  Mickaël Coustaty,et al.  Assessing and Minimizing the Impact of OCR Quality on Named Entity Recognition , 2020, TPDL.

[30]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[31]  Raphaël Troncy,et al.  NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Extraction Tools , 2012, EACL.

[32]  Peng Yu,et al.  BERT-Based Named Entity Recognition in Chinese Twenty-Four Histories , 2020, WISA.

[33]  Antal van den Bosch,et al.  A Link to the Past: Constructing Historical Social Networks , 2011, WASSA@ACL.

[34]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[35]  Vera Lúcia Strube de Lima,et al.  A systematic review of named entity recognition in biomedical texts , 2011, Journal of the Brazilian Computer Society.

[36]  Michael Piotrowski,et al.  Natural Language Processing for Historical Texts , 2012, Synthesis Lectures on Human Language Technologies.

[37]  Seth van Hooland,et al.  Mining User Queries with Information Extraction Methods and Linked Data , 2018, J. Documentation.

[38]  Paul Gooding,et al.  Exploring the information behaviour of users of Welsh Newspapers Online through web log analysis , 2016, J. Documentation.

[39]  Steve Cassidy Publishing the Trove Newspaper Corpus , 2016, LREC.

[40]  Alexander Erdmann,et al.  Challenges and Solutions for Latin Named Entity Recognition , 2016, LT4DH@COLING.

[41]  Danushka Bollegala,et al.  An Empirical Study on Fine-Grained Named Entity Recognition , 2018, COLING.

[42]  Beatrice Alex,et al.  Adapting the Edinburgh Geoparser for Historical Georeferencing , 2015, Int. J. Humanit. Arts Comput..

[43]  Jakub Piskorski,et al.  On Named Entity Recognition in Targeted Twitter Streams in Polish , 2013, BSNLP@ACL.

[44]  Daniel Hardy,et al.  The rise of digitization , 2002 .

[45]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[46]  Marcia J. Bates,et al.  The Getty End-User Online Searching Project in the Humanities: Report No. 6: Overview and Conclusions , 1996 .

[47]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[48]  Antoine Doucet,et al.  A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers , 2021, SIGIR.

[49]  Maud Ehrmann,et al.  Historical Newspaper User Interfaces: A Review , 2017 .

[50]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[51]  Heike Adel,et al.  An Analysis of Simple Data Augmentation for Named Entity Recognition , 2020, COLING.

[52]  Antoine Doucet,et al.  When to Use OCR Post-correction for Named Entity Recognition? , 2020, ICADL.

[53]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[54]  Catherine A. Johnson,et al.  Accidentally Found on Purpose: Information-Seeking Behavior of Historians in Archives , 2002, The Library Quarterly.

[55]  Simon Clematide,et al.  Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers , 2020, CLEF.

[56]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[57]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[58]  Mark Dredze,et al.  Entity Linking: Finding Extracted Entities in a Knowledge Base , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[59]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[60]  Li Wang,et al.  How Noisy Social Media Text, How Diffrnt Social Media Sources? , 2013, IJCNLP.

[61]  Lars Borin,et al.  Naming the Past: Named Entity and Animacy Recognition in 19th Century Swedish Literature , 2007, LaTeCH@ACL 2007.

[62]  Alexander Mehler,et al.  BIOfid Dataset: Publishing a German Gold Standard for Named Entity Recognition in Historical Biodiversity Literature , 2019, CoNLL.

[63]  Hermann Ney,et al.  Maximum Entropy Models for Named Entity Recognition , 2003, CoNLL.

[64]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[65]  Xianpei Han,et al.  A Rigourous Study on Named Entity Recognition: Can Fine-tuning Pretrained Model Lead to the Promised Land? , 2020, ArXiv.

[66]  Apostolos Antonacopoulos,et al.  Making Europe's Historical Newspapers Searchable , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[67]  Chenliang Li,et al.  A Survey on Deep Learning for Named Entity Recognition , 2018, IEEE Transactions on Knowledge and Data Engineering.

[68]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[69]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[70]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[71]  Monojit Choudhury,et al.  The State and Fate of Linguistic Diversity and Inclusion in the NLP World , 2020, ACL.

[72]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[73]  Simonetta Montemagni,et al.  "Voices of the Great War": A Richly Annotated Corpus of Italian Texts on the First World War , 2020, LREC.

[74]  Hiroyuki Shindo,et al.  LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention , 2020, EMNLP.

[75]  Ralph Grishman,et al.  Design of the MUC-6 evaluation , 1995, MUC.

[76]  Timo Honkela,et al.  Analyzing and Improving the Quality of a Historical News Collection using Language Technology and Statistical Machine Learning Methods , 2014 .

[77]  Simon Clematide,et al.  Geotagging a Diachronic Corpus of Alpine Texts: Comparing Distinct Approaches to Toponym Recognition , 2019, Proceedings of the Workshop on Language Technology for Digital Historical Archives - with a Special Focus on Central-, (South-)Eastern Europe, Middle East and North Africa.

[78]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[79]  Simon Clematide,et al.  Overview of CLEF HIPE 2020: Named Entity Recognition and Linking on Historical Newspapers , 2020, CLEF.

[80]  Cyril Grouin,et al.  Experiments from LIMSI at the French Named Entity Recognition Coarse-grained Task , 2020, CLEF.

[81]  Thierry Poibeau,et al.  Multi-source, Multilingual Information Extraction and Summarization , 2012, Theory and Applications of Natural Language Processing.

[82]  David W. Embley,et al.  Extracting person names from diverse and noisy OCR text , 2010, AND '10.

[83]  Dirk Hovy,et al.  Crowdsourcing and annotating NER for Twitter #drift , 2014, LREC.

[84]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[85]  Nina Tahmasebi,et al.  A Collection of Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data , 2021 .

[86]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[87]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[88]  Caroline Sporleder,et al.  Natural Language Processing for Cultural Heritage Domains , 2010, Lang. Linguistics Compass.

[89]  Sophie Rosset,et al.  Tree-Structured Named Entity Recognition on OCR Data: Analysis, Processing and Results , 2012, LREC.

[90]  Malvina Nissim,et al.  Data and models for metonymy resolution , 2009, Lang. Resour. Evaluation.

[91]  Laurent Romary,et al.  DeLFT and Entity-fishing: Tools for CLEF HIPE 2020 Shared Task , 2020, CLEF.

[92]  Alessandro Lenci,et al.  “Il Piave mormorava…”: Recognizing Locations and other Named Entities in Italian Texts on the Great War , 2014 .

[93]  José Luís Oliveira,et al.  Biomedical Named Entity Recognition: A Survey of Machine-Learning Tools , 2012 .

[94]  David Bamman,et al.  An annotated dataset of literary entities , 2019, North American Chapter of the Association for Computational Linguistics.

[95]  Kevin Knight,et al.  Name Translation in Statistical Machine Translation - Learning When to Transliterate , 2008, ACL.

[96]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[97]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[98]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[99]  Mickaël Coustaty,et al.  An Analysis of the Performance of Named Entity Recognition over OCRed Documents , 2019, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[100]  Clemens Neudecker,et al.  Large-scale refinement of digital historic newspapers with named entity recognition , 2014 .

[101]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[102]  David Bamman,et al.  Latin BERT: A Contextual Language Model for Classical Philology , 2020, ArXiv.

[103]  Beatrice Alex,et al.  Geoparsing historical and contemporary literary text set in the City of Edinburgh , 2019, Lang. Resour. Evaluation.

[104]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[105]  Antoine Doucet,et al.  Robust Named Entity Recognition and Linking on Historical Multilingual Documents , 2020, CLEF.

[106]  S. Clematide,et al.  The impresso system architecture in a nutshell , 2021 .

[107]  Christian Biemann,et al.  NoSta-D Named Entity Annotation for German: Guidelines and Dataset , 2014, LREC.

[108]  Kimmo Kettunen,et al.  Names, Right or Wrong: Named Entities in an OCRed Historical Finnish Newspaper Collection , 2017, DATeCH.

[109]  Yue Zhang,et al.  Design Challenges and Misconceptions in Neural Sequence Labeling , 2018, COLING.

[110]  Frédéric Kaplan,et al.  Deep Reference Mining From Scholarly Literature in the Arts and Humanities , 2018, Front. Res. Metr. Anal..

[111]  Kalina Bontcheva,et al.  Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content , 2002, ECDL.

[112]  Barbara Plank,et al.  What to do about non-standard (or non-canonical) language in NLP , 2016, KONVENS.

[113]  Barbara McGillivray,et al.  Assessing the Impact of OCR Quality on Downstream NLP Tasks , 2020, ICAART.

[114]  M. Worboys,et al.  Text Mining the History of Medicine , 2016, PloS one.

[115]  Malvina Nissim,et al.  Recognising Geographical Entities in Scottish Historical Documents , 2003 .

[116]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[117]  Josef Steinberger,et al.  Multilingual Media Monitoring and Text Analysis - Challenges for Highly Inflected Languages , 2013, TSD.

[118]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[119]  Sunghwan Mac Kim,et al.  Finding Names in Trove: Named Entity Recognition for Australian Historical Newspapers , 2015, ALTA.

[120]  Giovanni Colavizza,et al.  Index-Driven Digitization and Indexation of Historical Archives , 2019, Front. Digit. Humanit..

[121]  Jani Marjanen,et al.  Digital interfaces of historical newspapers: opportunities, restrictions and recommendations , 2020, J. Data Min. Digit. Humanit..

[122]  Frédéric Kaplan,et al.  Diachronic Evaluation of NER Systems on Old Newspapers , 2016, KONVENS.

[123]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[124]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[125]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[126]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[127]  Olivier Galibert,et al.  Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers , 2012, LAW@ACL.

[128]  Stefan Schweter,et al.  Towards Robust Named Entity Recognition for Historic German , 2019, RepL4NLP@ACL.

[129]  Sebastian Padó,et al.  A Named Entity Recognition Shootout for German , 2018, ACL.

[130]  Patricia Murrieta-Flores,et al.  Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora , 2018, Front. Digit. Humanit..

[131]  Daniel P. Lopresti Optical character recognition errors and their effects on natural language processing , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[132]  Lidia Pivovarova,et al.  Grouping business news stories based on salience of named entities , 2017, EACL.

[133]  Pedro Ortiz Suarez,et al.  SinNer@Clef-Hipe2020 : Sinful adaptation of SotA models for Named Entity Recognition in French and German , 2020, CLEF.

[134]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[135]  Kimmo Kettunen,et al.  À la recherche du nom perdu – Searching for Named Entities with Stanford NER in a Finnish Historical Newspaper and Journal Collection , 2018 .

[136]  Clemens Neudecker,et al.  An Open Corpus for Named Entity Recognition in Historic Newspapers , 2016, LREC.

[137]  Maarten de Rijke,et al.  Search behavior of media professionals at an audiovisual archive: A transaction log analysis , 2010, J. Assoc. Inf. Sci. Technol..

[138]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[139]  António Branco,et al.  The BDCamões Collection of Portuguese Literary Documents: a Research Resource for Digital Humanities and Language Technology , 2020, LREC.

[140]  Pavel Král,et al.  Czech Historical Named Entity Corpus v 1.0 , 2020, LREC.

[141]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[142]  Antoine Doucet,et al.  Alleviating Digitization Errors in Named Entity Recognition for Historical Documents , 2020, CONLL.

[143]  Frédéric Kaplan,et al.  Big Data of the Past , 2017, Front. Digit. Humanit..

[144]  Kai Labusch,et al.  Named Entity Disambiguation and Linking Historic Newspaper OCR with BERT , 2020, CLEF.

[145]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[146]  Vincent Guigue,et al.  Contextualized Embeddings in Named-Entity Recognition: An Empirical Study on Generalization , 2019, ECIR.

[147]  Beatrice Alex,et al.  Digitised historical text: Does it have to be mediOCRe? , 2012, KONVENS.

[148]  Alexander Erdmann,et al.  Practical, Efficient, and Customizable Active Learning for Named Entity Recognition in the Digital Humanities , 2019, NAACL.

[149]  Simon Clematide,et al.  Language Resources for Historical Newspapers: the Impresso Collection , 2020, LREC.

[150]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[151]  Kai Labusch,et al.  BERT for Named Entity Recognition in Contemporary and Historic German , 2019, KONVENS.

[152]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[153]  Olivier Galibert,et al.  Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign , 2012, LREC.

[154]  Antoine Isaac,et al.  Named Entity Recommendations to Enhance Multilingual Retrieval in Europeana.eu , 2020, ISMIS.

[155]  Claire Grover,et al.  Named Entity Recognition for Digitised Historical Texts , 2008, LREC.

[156]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[157]  Damien Nouvel,et al.  Named Entity Resources - Overview and Outlook , 2016, LREC.

[158]  Shady Elbassuoni,et al.  Time-Aware Word Embeddings for Three Lebanese News Archives , 2020, LREC.

[159]  Thomas Padilla,et al.  Responsible operations: data science, machine learning, and AI in libraries , 2019 .

[160]  Jacob Eisenstein,et al.  What to do about bad language on the internet , 2013, NAACL.

[161]  Philip S. Yu,et al.  Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT , 2020, ArXiv.

[162]  Olivier Galibert,et al.  The ETAPE speech processing evaluation , 2014, LREC.

[163]  Sara Tonelli,et al.  Novel Event Detection and Classification for Historical Texts , 2019, Computational Linguistics.

[164]  Tobias Blanke,et al.  Comparison of named entity recognition tools for raw OCR text , 2012, KONVENS.

[165]  Philippe Gambette,et al.  Normalisation of 16th and 17th century texts in French and geographical named entity recognition , 2020, GeoHumanities@SIGSPATIAL.

[166]  Melissa Terras,et al.  “Many hands make light work. Many hands together make merry work”: Transcribe Bentham and crowdsourcing manuscript collections , 2014 .

[167]  Mark Davies Expanding horizons in historical linguistics with the 400-million word Corpus of Historical American English , 2012 .

[168]  Mickaël Coustaty,et al.  Impact of OCR Errors on the Use of Digital Libraries: Towards a Better Access to Information , 2017, 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[169]  Fatiha Sadat,et al.  Hybrid Statistical and Attentive Deep Neural Approach for Named Entity Recognition in Historical Newspapers , 2020, CLEF.

[170]  David A. Smith Detecting events with date and place information in unstructured text , 2002, JCDL '02.

[171]  Steven Bethard,et al.  A Survey on Recent Advances in Named Entity Recognition from Deep Learning models , 2018, COLING.

[172]  Michael Gamon,et al.  Active objects: actions for entity-centric search , 2012, WWW.

[173]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[174]  Mª Luisa Díez Platas,et al.  Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information , 2020, J. Assoc. Inf. Sci. Technol..

[176]  Kalina Bontcheva,et al.  Generalisation in named entity recognition: A quantitative analysis , 2017, Comput. Speech Lang..

[177]  András Kocsor,et al.  A Multilingual Named Entity Recognition System Using Boosting and C4.5 Decision Tree Learning Algorithms , 2006, Discovery Science.

[178]  Catalin Boja,et al.  A Survey on Named Entity Recognition Solutions Applied for Cybersecurity-Related Text Processing , 2020 .

[179]  Alejandro Héctor Toselli,et al.  Transforming scholarship in the archives through handwritten text recognition , 2019, J. Documentation.

[180]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[181]  Khaled Shaalan,et al.  A Survey of Arabic Named Entity Recognition and Classification , 2014, CL.

[182]  Hideki Isozaki,et al.  Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.

[183]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[184]  Sara Tonelli,et al.  Fifty years of European history through the Lens of Computational Linguistics: the De Gasperi Project , 2016 .

[185]  Ludovic Moncla,et al.  Automated Geoparsing of Paris Street Names in 19th Century Novels , 2017, GeoHumanities@SIGSPATIAL.