From documents to data: linked data at the Dutch Parliament

Parliamentary debates are important for the general public and for scientific research in numerous fields, such as political science, historical science, linguistics and communication; they are an interesting domain to apply state-of-the-art information retrieval technology. Parliamentary debates are highly structured transcripts of meetings of politicians in parliament. These debates are an important part of the cultural heritage of countries; they are often free of copyright; citizens often have a legal right to inspect them; and several countries make great effort to digitise their entire historical collection and open that up to the general public. In this paper, we analyse the structure of the parliamentary proceedings, show how proceedings in PDF format can be transformed into XML and describe the use of permanent identifiers for entities in parliamentary texts. Having the proceedings in XML makes a wide range of applications possible. We elaborate on four of these: entry point retrieval, advanced content and structure search; automatic creation of tables of contents and hyperlinked navigation menus; and large savings on storage space and bandwidth for scanned documents. We also describe the benefits of this approach for the so-called transparency of the parliamentary process for citizens and stakeholders.

[1]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[2]  Maarten Marx,et al.  What You Say is Who You Are. How Open Government Data Facilitates Profiling Politicians , 2010, OKCon.

[3]  M. de Rijke,et al.  Articulating information needs in XML query languages , 2006, TOIS.

[4]  Maarten Marx,et al.  Focused retrieval and result aggregation with political data , 2010, Information Retrieval.

[5]  Maarten Marx,et al.  Digital sustainable publication of legacy parliamentary proceedings , 2010, DG.O.

[6]  Frank Neven,et al.  Learning deterministic regular expressions for the inference of schemas from XML data , 2008, WWW.

[7]  Maarten A. Hajer,et al.  Setting the Stage , 2005, Strategy and Command.

[8]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[9]  Krisztian Balog,et al.  People search in the enterprise , 2007, SIGIR.

[10]  M. de Rijke,et al.  Semantic characterizations of navigational XPath , 2005, SGMD.

[11]  Börkur Sigurbjörnsson,et al.  Focused information access using XML element retrieval , 2006 .

[12]  C. C. van Baalen,et al.  In vergadering bijeen. Rituelen, symbolen, tradities en gebruiken in de Tweede Kamer , 2008 .

[13]  Maarten Marx,et al.  Digital weight watching: reconstruction of scanned documents , 2009, AND '09.

[14]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[15]  Robert Hariman,et al.  Political Style: The Artistry of Power , 1995 .

[16]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[17]  Maarten Marx,et al.  DutchParl. The Parliamentary Documents in Dutch , 2010, LREC.

[18]  Janet Seaton,et al.  The Scottish Parliament and e-democracy , 2005, Aslib Proc..

[19]  Andrew Trotman,et al.  The Simplest Query Language That Could Possibly Work , 2004 .

[20]  Barrie Gunter,et al.  Advances in e-democracy: overview , 2006, Aslib Proc..