Towards Interoperable Open Statistical Data

An important part of Open Data is of statistical nature and describes economic and social indicators monitoring population size, inflation, trade, and employment. Combining and analysing Open Data from multiple datasets and sources enable the performance of advanced data analytics scenarios that could result in valuable services and data products. However, it is still difficult to discover and combine open statistical data that reside in different data portals. Although Linked Open Statistical Data (LOSD) provide standards and approaches to facilitate combining statistics on the Web, various interoperability challenges still exist. In this paper, we define interoperability conflicts that hamper combining and analysing LOSD from different portals. Towards this end, we start from a thorough literature review on databases and data warehouses interoperability conflicts. Based on this review, we define interoperability conflicts that may appear in LOSD. We defined two types of schema-level conflicts namely, naming conflicts and structural conflicts. Naming conflicts include homonyms and synonyms and result from the different URIs used in the data cubes. Structural conflicts result from different practices of modelling the structure of data cubes.

[1]  Shoki Nishimura,et al.  Publication of Statistical Linked Open Data in Japan , 2016, SemStats@ISWC.

[2]  Michael Hausenblas,et al.  Official Statistics and the Practice of Data Fidelity , 2011 .

[3]  Richard T. Watson,et al.  Analyzing the Past to Prepare for the Future: Writing a Literature Review , 2002, MIS Q..

[4]  Efthimios Tambouris,et al.  Linked Open Cube Analytics Systems: Potential and Challenges , 2016, IEEE Intelligent Systems.

[5]  Vipul Kashyap,et al.  So Far (Schematically) yet So Near (Semantically) , 1992, DS-5.

[6]  Peter P. Chen The entity-relationship model: toward a unified view of data , 1975, VLDB '75.

[7]  Efthimios Tambouris,et al.  Linked Open Government Data Analytics , 2013, EGOV.

[8]  Efthimios Tambouris,et al.  A classification scheme for open government data: towards linking decentralised data , 2011, Int. J. Web Eng. Technol..

[9]  Sean Bechhofer,et al.  SKOS Simple Knowledge Organization System Reference , 2009 .

[10]  Kyong-Ha Lee,et al.  Conflict classification and resolution in heterogeneous information integration based on XML Schema , 2002, 2002 IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering. TENCOM '02. Proceedings..

[11]  Alon Y. Halevy,et al.  Semantic Integration Research in the Database Community : A Brief Survey , 2005 .

[12]  Luca Cabibbo,et al.  A Logical Approach to Multidimensional Databases , 1998, EDBT.

[13]  Anindya Datta,et al.  The cube data model: a conceptual model and algebra for on-line analytical processing in data warehouses , 1999, Decis. Support Syst..

[14]  Sudha Ram,et al.  Semantic conflict resolution ontology (SCROL): an ontology for detecting and resolving data and schema-level semantic conflicts , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Amar Gupta,et al.  A Methodology for Integration of Heterogeneous Databases , 1994, IEEE Trans. Knowl. Data Eng..

[16]  A Min Tjoa,et al.  A framework for a multidimensional OLAP model using Topic Maps , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[17]  Yannis Charalabidis,et al.  Benefits, Adoption Barriers and Myths of Open Data and Open Government , 2012, Inf. Syst. Manag..

[18]  Claudia Diamantini,et al.  Data Mart Reconciliation in Virtual Innovation Factories , 2014, CAiSE Workshops.

[19]  Oscar Mangisengi,et al.  A Framework for Supporting Interoperability of Data Warehouse Islands Using XML , 2001, DaWaK.

[20]  Eric Stephan W3C Data Usage Vocabulary (Data on the Web Best Practices Working Group) , 2016 .

[21]  Sören Auer,et al.  Linked SDMX Data: Path to high fidelity Statistical Linked Data , 2015, Semantic Web.

[22]  Michael Schrefl,et al.  FedDW global schema architect: UML-based design tool for the integration of data mart schemas , 2012, DOLAP '12.

[23]  Aris M. Ouksel,et al.  A classification of semantic conflicts in heterogeneous database systems , 1995, J. Organ. Comput..

[24]  Riccardo Torlone Interoperability in Data Warehouses , 2009, Encyclopedia of Database Systems.

[25]  Jungyun Seo,et al.  Classifying schematic and data heterogeneity in multidatabase systems , 1991, Computer.

[26]  Hongjun Lu,et al.  An aspect of query optimization in multidatabase systems , 1995, SGMD.

[27]  Frank S. C. Tseng,et al.  Integrating heterogeneous data warehouses using XML technologies , 2005, J. Inf. Sci..

[28]  Yvan Bédard,et al.  A Conceptual Framework to Support Semantic Interoperability of Geospatial Datacubes , 2007, ER Workshops.

[29]  Torben Bach Pedersen,et al.  On-demand multidimensional data integration: toward a semantic foundation for cloud intelligence , 2011, The Journal of Supercomputing.

[30]  Stefano Spaccapietra,et al.  Model independent assertions for integration of heterogeneous schemas , 1992, The VLDB Journal.

[31]  Bernhard Thalheim,et al.  Hetero-homogeneous hierarchies in data warehouses , 2010, APCCM.

[32]  Riccardo Torlone,et al.  Two approaches to the integration of heterogeneous data warehouses , 2008, Distributed and Parallel Databases.

[33]  Michael Schrefl,et al.  From Federated Databases to a Federated Data Warehouse System , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[34]  Torben Bach Pedersen,et al.  Integrating Data Warehouses with Web Data: A Survey , 2008, IEEE Transactions on Knowledge and Data Engineering.

[35]  Ramanathan Gnanadesikan,et al.  Methods for statistical data analysis of multivariate observations , 1977, A Wiley publication in applied statistics.