Who mentions whom? Recognizing political actors in proceedings

We show that it is straightforward to train a state of the art named entity tagger (spaCy) to recognize political actors in Dutch parliamentary proceedings with high accuracy. The tagger was trained on 3.4K manually labeled examples, which were created in a modest 2.5 days work. This resource is made available on github. Besides proper nouns of persons and political parties, the tagger can recognize quite complex definite descriptions referring to cabinet ministers, ministries, and parliamentary committees. We also provide a demo search engine which employs the tagged entities in its SERP and result summaries.

[1]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[2]  Marti A. Hearst Search User Interfaces , 2009 .

[3]  Eckhard Bick A Named Entity Recognizer for Danish , 2004, LREC.

[4]  Claire Grover,et al.  Named Entity Recognition for Digitised Historical Texts , 2008, LREC.

[5]  Claire Cardie,et al.  Text Annotation for Political Science Research , 2008 .

[6]  Maarten Marx,et al.  Digital sustainable publication of legacy parliamentary proceedings , 2010, DG.O.

[7]  Ying Wang,et al.  Entity-Based Semantic Search on Conversational Transcripts Semantic , 2012, JIST.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Manaal Faruqui,et al.  Training and Evaluating a German Named Entity Recognizer with Semantic Generalization , 2010, KONVENS.

[10]  Frank Van Eynde,et al.  Large Scale Syntactic Annotation of Written Dutch: Lassy , 2013, Essential Speech and Language Technology for Dutch.

[11]  Uldis Bojars,et al.  LinkedSaeima: A Linked Open Dataset of Latvia's Parliamentary Debates , 2019, SEMANTiCS.

[12]  Maarten Marx,et al.  A Hybrid Approach to Domain-Specific Entity Linking , 2015, SEMANTiCS.

[13]  Andreas Blätte,et al.  The GermaParl Corpus of Parliamentary Protocols , 2018, LREC.

[14]  Fabio Vitali,et al.  Akoma-Ntoso for Legal Documents , 2011 .

[15]  Tim Alberdingk Thijm,et al.  Digitization of the Canadian Parliamentary Debates , 2017, Canadian Journal of Political Science.

[16]  Krisztian Balog,et al.  Entity-Oriented Search , 2018, The Information Retrieval Series.

[17]  Andrej Pančur,et al.  Smart Big Data: Use of Slovenian Parliamentary Papers in Digital History , 2016 .

[18]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[19]  Zhiyong Lu,et al.  Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets , 2019, BioNLP@ACL.