The debates of the European Parliament as Linked Open Data

The European Parliament represents the citizens of the member states of the European Union (EU). The accounts of its meetings and related documents are open data, promoting transparency and accountability, and are used as source data by researchers. However, the official portal of these documents provides limited search facilities. This paper presents LinkedEP, a Linked Open Data translation of the verbatim reports of the plenary meetings of the European Parliament. These data are integrated with a database of political affiliations of the Members of Parliament, and enriched with detected topics from the EU's topic hierarchy and links to four other Linked Open Datasets. The results of this work are available through a SPARQL endpoint and a user interface with extensive browse and search facilities. It is now possible to combine in one query the time and topic of the debate, the spoken words - in any available translation - and information about the speaker uttering these, such as affiliations to countries, parties and committees. This paper discusses the design and creation of the vocabulary, data and links, as well as known use of the data.

[1]  Maarten Marx,et al.  Exemelification of parliamentary debates , 2009 .

[2]  Bruno Pouliquen,et al.  JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource , 2011, RANLP.

[3]  Krzysztof Janowicz,et al.  Five stars of Linked Data vocabulary use , 2014, Semantic Web.

[4]  Indraneel Sircar,et al.  Forum Section , 2009 .

[5]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[6]  Jonathan B. Slapin,et al.  Position Taking in European Parliament Speeches , 2010 .

[7]  Véronique Malaisé,et al.  Design and use of the Simple Event Model (SEM) , 2011, J. Web Semant..

[8]  Ralf Steinberger,et al.  JRC Eurovoc Indexer JEX - A freely available multi-label categorisation tool , 2012, LREC.

[9]  Ralf Steinberger,et al.  JRC-Names: Multilingual entity name variants and titles as Linked Data , 2016, Semantic Web.

[10]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[11]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[12]  Peter Wittenburg,et al.  CLARIN: Common Language Resources and Technology Infrastructure , 2008, LREC.

[13]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[14]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[15]  Julie Birkholz Networks of Higher Education Institutions : a social network approach to the study of governance arrangements , 2015 .

[16]  Lynda Hardman,et al.  Impact Analysis of OCR Quality on Research Tasks in Digital Archives , 2015, TPDL.