Matching disparate dimensions for analytical integration of heterogeneous data sources

The paper presents the first steps towards an authorial integration methodology for heterogeneous data. Exposing information from multiple heterogeneous data sources demands a global (mediated) schema. We need a model to couple with the mismatches between schemata of different sources and to provide uniform access to the data. The virtual global schema is apparently more convenient for assembling big data sources because of useless time consumption during the processes of materialization and synchronization. Thus, an integral analytical model has been proposed as the global schema of heterogeneous data sources. The suggested model provides virtual integration of complex and diverse information for further analytical processing. It combines the original multidimensional design and lattice structure according to the formal conceptual analysis. The main goal of the paper is to suggest an approach to automatic mapping between the schemata of the disparate data sources and virtual integral analytical model with human moderation.

[1]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling , 2013 .

[2]  Alexander A. Anisimov Review of The data warehouse toolkit: the complete guide to dimensional modeling (2nd edition) by Ralph Kimball, Margy Ross. John Wiley & Sons, Inc. 2002. , 2003, SGMD.

[3]  Alon Y. Halevy,et al.  Principles of Data Integration , 2012 .

[4]  T. G. Penkova,et al.  The integral OLAP-model of the emergency risk estimation in the case of Krasnoyarsk region , 2013, 2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[5]  Torben Bach Pedersen,et al.  Integrating Data Warehouses with Web Data: A Survey , 2008, IEEE Transactions on Knowledge and Data Engineering.

[6]  Ayoub Elotmani,et al.  Automating the Conceptual Modeling of Data Warehouse in Information System ERP Type , 2017 .

[7]  Anna V. Korobko,et al.  On-line analytical processing based on formal concept analysis , 2010, ICCS.

[8]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[9]  Xavier Franch,et al.  Towards Automated Data Integration in Software Analytics , 2018, BIRTE.

[10]  Anna Korobko Technology of Exploratory OLAP Based on the Integral Analytical Model , 2016 .

[11]  Erik Thomsen,et al.  OLAP Solutions - Building Multidimensional Information Systems , 1997 .

[12]  Torben Bach Pedersen,et al.  Multidimensional Database Technology , 2001, Computer.

[13]  Michael Benedikt,et al.  Logical foundations of information disclosure in ontology-based data integration , 2018, Artif. Intell..

[14]  Torben Bach Pedersen,et al.  Using Semantic Web Technologies for Exploratory OLAP: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[15]  Ashish Sharma,et al.  On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research , 2018, ArXiv.

[16]  Rahul Singh,et al.  Integrating Data Mining and On-line Analytical Processing for Intelligent Decision Systems , 2002, J. Decis. Syst..

[17]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[18]  Herna L. Viktor,et al.  Merging Multidimensional Data Models: A Practical Approach for Schema and Data Instances , 2013, DBKDA 2013.

[19]  Torben Bach Pedersen,et al.  Analytical metadata modeling for next generation BI systems , 2018, J. Syst. Softw..

[20]  Anna V. Korobko,et al.  Constructing the Integral OLAP-Model for Scientific Activities Based on FCA , 2012, KES.

[21]  Claudia Diamantini,et al.  Multidimensional query reformulation with measure decomposition , 2018, Inf. Syst..

[22]  Torben Bach Pedersen,et al.  Towards Exploratory OLAP Over Linked Open Data - A Case Study , 2014, BIRTE.

[23]  Anna V. Korobko,et al.  On-line control of the state of technosphere and environment objects in Krasnoyarsk region based on monitoring data , 2016, Int. J. Knowl. Based Intell. Eng. Syst..

[24]  Roberto Henriques,et al.  Augmenting data warehousing architectures with hadoop , 2018 .

[25]  Antonio Albano Decision Support Databases Essentials , 2013 .

[26]  Garrett Birkhoff,et al.  A survey of modern algebra , 1942 .

[27]  Jiawei Han,et al.  OLAP Mining: Integration of OLAP with Data Mining , 1997, DS-7.

[28]  Olivier Teste Towards Conceptual Multidimensional Design in Decision Support Systems , 2010, ADBIS 2010.

[29]  Paul Alpar,et al.  Self-Service Business Intelligence , 2016, Bus. Inf. Syst. Eng..

[30]  Z. Irani,et al.  Critical analysis of Big Data challenges and analytical methods , 2017 .

[31]  Torben Bach Pedersen,et al.  A relevance-extended multi-dimensional model for a data warehouse contextualized with documents , 2005, DOLAP '05.

[32]  Esko Ukkonen,et al.  Approximate String Matching with q-grams and Maximal Matches , 1992, Theor. Comput. Sci..

[33]  Esteban Zimányi,et al.  Conceptual Data Warehouse Design , 2014 .

[34]  Gottfried Vossen,et al.  Towards Self-Service Business Intelligence , 2013 .

[35]  Sellappan Palaniappan,et al.  Clinical Decision Support Using OLAP With Data Mining , 2008 .

[36]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[37]  David Taniar,et al.  Integrations of Data Warehousing, Data Mining and Database Technologies - Innovative Approaches , 2011 .

[38]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[39]  Aleksei Korobko,et al.  Multidimensional Design from XML Sources for the Integral Analytical Model , 2018 .