Linked Open Piracy: A Story about e-Science, Linked Data, and Statistics

There is an abundance of semi-structured reports on events being written and made available on the World Wide Web on a daily basis. These reports are primarily meant for human use. A recent movement is the addition of RDF metadata to make automatic processing by computers easier. A fine example of this movement is the open government data initiative which, by representing data from spreadsheets and textual reports in RDF, strives to speed up the creation of geographical mashups and visual analytic applications. In this paper, we present a newly linked dataset and the method we used to automatically translate semi-structured reports on the Web to an RDF event model. We demonstrate how the semantic representation layer makes it possible to easily analyze and visualize the aggregated reports to answer domain questions through a SPARQL client for the R statistical programming language. We showcase our method on piracy attack reports issued by the International Chamber of Commerce (ICC-CCS). Our pipeline includes conversion of the reports to RDF, linking their parts to external resources from the linked open data cloud and exposing them to the Web.

[1]  Thomas Tsilis Counter-piracy escort operations in the Gulf of Aden , 2011 .

[2]  James A. Hendler,et al.  TWC LOGD: A portal for linked open government data ecosystems , 2011, J. Web Semant..

[3]  Piroska Lendvai,et al.  From Field Notes towards a Knowledge Base , 2008, LREC.

[4]  Christopher Bellamy Maritime Piracy , 2011 .

[5]  M.G.J. van Erp Accessing natural history : Discoveries in data cleaning, structuring, and retrieval , 2010 .

[6]  Caroline Sporleder,et al.  Bootstrapping Information Extraction from Field Books , 2007, EMNLP.

[7]  Véronique Malaisé,et al.  Design and use of the Simple Event Model (SEM) , 2011, J. Web Semant..

[8]  Lora Aroyo,et al.  Automatic Heritage Metadata Enrichment with Historic Events , 2011 .

[9]  Michal Pechoucek,et al.  Using Agents to Improve International Maritime Transport Security , 2011, IEEE Intelligent Systems.

[10]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[11]  Andrian Marcus,et al.  Data Cleansing: Beyond Integrity Analysis , 2000, IQ.

[12]  J. Gerring A case study , 2011, Technology and Society.

[13]  Véronique Malaisé,et al.  An integrated approach for visual analysis of a multisource moving objects knowledge base , 2010, Int. J. Geogr. Inf. Sci..

[14]  Inmaculada Martínez-Zarzoso,et al.  How Costly is Modern Maritime Piracy to the International Community? , 2012 .

[15]  Zhisheng Huang,et al.  SWI-Prolog and the web , 2007, Theory and Practice of Logic Programming.

[16]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[17]  Andrew McCallum,et al.  Learning Extractors from Unlabeled Text using Relevant Databases , 2007 .

[18]  Pascal Matsakis,et al.  Relative positions in words: a system that builds descriptions around Allen relations , 2010, Int. J. Geogr. Inf. Sci..

[19]  Wendy Hall,et al.  Put in Your Postcode, Out Comes the Data: A Case Study , 2010, ESWC.

[20]  Klaus Hanke,et al.  Methodology for CIDOC CRM based data integration with spatial data , 2013 .

[21]  Willem Robert van Hage,et al.  The Space Package: Tight Integration between Space and Semantics , 2010, Trans. GIS.

[22]  Lora Aroyo,et al.  Semantically-Enhanced Recommendations in Cultural Heritage , 2005 .

[23]  Andrian Marcus,et al.  Data Cleansing: Beyond Integrity Analysis 1 , 2000 .

[24]  Raphaël Troncy,et al.  LODE: Linking Open Descriptions of Events , 2009, ASWC.

[25]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[26]  V. S. Costa,et al.  Theory and Practice of Logic Programming , 2010 .