Interoperability Conflicts in Linked Open Statistical Data

An important part of Open Data is of a statistical nature and describes economic and social indicators monitoring population size, inflation, trade, and employment. Combining and analyzing Open Data from multiple datasets and sources enable the performance of advanced data analytics scenarios that could result in valuable services and data products. However, it is still difficult to discover and combine Open Statistical Data that reside in different data portals. Although Linked Open Statistical Data (LOSD) provide standards and approaches to facilitate combining statistics on the Web, various interoperability challenges still exist. In this paper, we propose an Interoperability Framework for LOSD, comprising definitions of LOSD interoperability conflicts as well as modelling practices currently used by six official open government data portals. Towards this end, we combine a top-down approach that studies interoperability conflicts in the literature with a bottom-up approach that studies the modelling practices of data portals. We define two types of LOSD schema-level conflicts, namely naming conflicts and structural conflicts. Naming conflicts result from using different URIs. Structural conflicts result from different practices of modelling the structure of data cubes. Only two out of the 19 conflicts are currently resolved and 11 can be resolved according to literature.

[1]  Michael Hausenblas,et al.  Exploiting Linked Data to Build Web Applications , 2009, IEEE Internet Computing.

[2]  Jungyun Seo,et al.  Classifying schematic and data heterogeneity in multidatabase systems , 1991, Computer.

[3]  Evangelos Kalampokis,et al.  Towards Interoperable Open Statistical Data , 2019, EGOV.

[4]  Gottfried Vossen,et al.  Towards Self-Service Business Intelligence , 2013 .

[5]  Torben Bach Pedersen,et al.  Integrating Data Warehouses with Web Data: A Survey , 2008, IEEE Transactions on Knowledge and Data Engineering.

[6]  Evangelos Kalampokis,et al.  On modeling linked open statistical data , 2019, J. Web Semant..

[7]  Efthimios Tambouris,et al.  Linked Open Cube Analytics Systems: Potential and Challenges , 2016, IEEE Intelligent Systems.

[8]  Efthimios Tambouris,et al.  A classification scheme for open government data: towards linking decentralised data , 2011, Int. J. Web Eng. Technol..

[9]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[10]  Stefano Spaccapietra,et al.  Model independent assertions for integration of heterogeneous schemas , 1992, The VLDB Journal.

[11]  Aris M. Ouksel,et al.  A classification of semantic conflicts in heterogeneous database systems , 1995, J. Organ. Comput..

[12]  Rinke Hoekstra,et al.  Linked Humanities Data: The Next Frontier? A Case-study in Historical Census Data , 2012, LISC@ISWC.

[13]  Riccardo Torlone,et al.  Two approaches to the integration of heterogeneous data warehouses , 2008, Distributed and Parallel Databases.

[14]  Michael Schrefl,et al.  From Federated Databases to a Federated Data Warehouse System , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[15]  Luca Cabibbo,et al.  A Logical Approach to Multidimensional Databases , 1998, EDBT.

[16]  Amit P. Sheth,et al.  Semantic Services, Interoperability and Web Applications - Emerging Concepts , 2011, Semantic Services, Interoperability and Web Applications.

[17]  Vipul Kashyap,et al.  So Far (Schematically) yet So Near (Semantically) , 1992, DS-5.

[18]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[19]  Marco A. Casanova,et al.  Publishing Statistical Data on the Web , 2012, 2012 IEEE Sixth International Conference on Semantic Computing.

[20]  Kyong-Ha Lee,et al.  Conflict classification and resolution in heterogeneous information integration based on XML Schema , 2002, 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM '02. Proceedings..

[21]  Hongjun Lu,et al.  An aspect of query optimization in multidatabase systems , 1995, SGMD.

[22]  Sean Bechhofer,et al.  SKOS Simple Knowledge Organization System Reference , 2009 .

[23]  Frank S. C. Tseng,et al.  Integrating heterogeneous data warehouses using XML technologies , 2005, J. Inf. Sci..

[24]  Yvan Bédard,et al.  A Conceptual Framework to Support Semantic Interoperability of Geospatial Datacubes , 2007, ER Workshops.

[25]  Torben Bach Pedersen,et al.  On-demand multidimensional data integration: toward a semantic foundation for cloud intelligence , 2011, The Journal of Supercomputing.

[26]  George Papastefanatos,et al.  Publishing census as linked open data: a case study , 2013, WOD '13.

[27]  Amar Gupta,et al.  A Methodology for Integration of Heterogeneous Databases , 1994, IEEE Trans. Knowl. Data Eng..

[28]  A Min Tjoa,et al.  A framework for a multidimensional OLAP model using Topic Maps , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[29]  Sudha Ram,et al.  Semantic conflict resolution ontology (SCROL): an ontology for detecting and resolving data and schema-level semantic conflicts , 2004, IEEE Transactions on Knowledge and Data Engineering.

[30]  Sören Auer,et al.  Linked SDMX Data: Path to high fidelity Statistical Linked Data , 2015, Semantic Web.

[31]  Shoki Nishimura,et al.  Publication of Statistical Linked Open Data in Japan , 2016, SemStats@ISWC.

[32]  Alon Y. Halevy,et al.  Semantic Integration Research in the Database Community : A Brief Survey , 2005 .

[33]  Anindya Datta,et al.  The cube data model: a conceptual model and algebra for on-line analytical processing in data warehouses , 1999, Decis. Support Syst..

[34]  Riccardo Torlone Interoperability in Data Warehouses , 2018, Encyclopedia of Database Systems.

[35]  Efthimios Tambouris,et al.  Linked Open Government Data Analytics , 2013, EGOV.