The potential of semantic paradigm in warehousing of big data

ABSTRACT Big data have analytical potential that was hard to realize with available technologies. After new storage paradigms intended for big data such as NoSQL databases emerged, traditional systems got pushed out of the focus. The current research is focused on their reconciliation on different levels or paradigm replacement. Similarly, the emergence of NoSQL databases has started to push traditional (relational) data warehouses out of the research and even practical focus. Data warehousing is known for the strict modelling process, capturing the essence of the business processes. For that reason, a mere integration to bridge the NoSQL gap is not enough. It is necessary to deal with this issue on a higher abstraction level during the modelling phase. NoSQL databases generally lack clear, unambiguous schema, making the comprehension of their contents difficult and their integration and analysis harder. This motivated involving semantic web technologies to enrich NoSQL database contents by additional meaning and context. This paper reviews the application of semantics in data integration and data warehousing and analyses its potential in integrating NoSQL data and traditional data warehouses with some focus on document stores. Also, it gives a proposal of the future pursuit directions for the big data warehouse modelling phases.

[1]  Ian Horrocks,et al.  FaCT++ Description Logic Reasoner: System Description , 2006, IJCAR.

[2]  A Min Tjoa,et al.  Automated Integration of Heterogeneous Data Warehouse Schemas , 2008, Int. J. Data Warehous. Min..

[3]  Paul Buitelaar,et al.  A Protégé Plug-In for Ontology Extraction from Text Based on Linguistic Analysis , 2004, ESWS.

[4]  A Min Tjoa,et al.  Automating the Schema Matching Process for Heterogeneous Data Warehouses , 2007, DaWaK.

[5]  Francesco Di Tria,et al.  Evaluation of Data Warehouse Design Methodologies in the Context of Big Data , 2017, DaWaK.

[6]  Jérôme David,et al.  Matching directories and OWL ontologies with AROMA , 2006, CIKM '06.

[7]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[8]  Hong Min,et al.  Octopus: Hybrid Big Data Integration Engine , 2015, 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom).

[9]  Faïez Gargouri,et al.  MongoDB-Based Modular Ontology Building for Big Data Integration , 2018, Journal on Data Semantics.

[10]  K Vivekanandan,et al.  An Ontological Approach to Handle Multidimensional Schema Evolution for Data Warehouse , 2014 .

[12]  Maria-Esther Vidal,et al.  Towards Semantification of Big Data Technology , 2016, DaWaK.

[13]  Jamel Feki,et al.  TOWARD AN ONTOLOGY BASED APPROACH FOR DATA WAREHOUSING STATE OF THE ART AND PROPOSAL , 2014 .

[14]  Gottfried Vossen,et al.  Schema versioning in data warehouses: Enabling cross-version querying via schema augmentation , 2006, Data Knowl. Eng..

[15]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[16]  Matteo Golfarelli,et al.  Schema Profiling of Document Stores , 2017, SEBD.

[17]  Francesco Di Tria,et al.  Design process for Big Data Warehouses , 2014, 2014 International Conference on Data Science and Advanced Analytics (DSAA).

[18]  Torben Bach Pedersen,et al.  Multidimensional Integrated Ontologies: A Framework for Designing Semantic Data Warehouses , 2009, J. Data Semant..

[19]  Oscar Romero Moral Automating the multidimensional design of data warehouses , 2010 .

[20]  Francesco Di Tria,et al.  Academic data warehouse design using a hybrid methodology , 2015, Comput. Sci. Inf. Syst..

[21]  Chen Wang,et al.  Schema Management for Document Stores , 2015, Proc. VLDB Endow..

[22]  Anjana Gosain,et al.  DWEVOLVE: a requirement based framework for data warehouse evolution , 2011, SOEN.

[23]  Abderrahim Sekkaki,et al.  Automating Data warehouse design using ontology , 2016, 2016 International Conference on Electrical and Information Technologies (ICEIT).

[24]  Stefano Rizzi,et al.  A Model-Driven Heuristic Approach for Detecting Multidimensional Facts in Relational Data Sources , 2010, DaWak.

[25]  Boris Vrdoljak,et al.  Big Data and New Data Warehousing Approaches , 2017, ICCBDC 2017.

[26]  Torben Bach Pedersen,et al.  Using Semantic Web Technologies for Exploratory OLAP: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[27]  Sudha Ram,et al.  Semantic conflict resolution ontology (SCROL): an ontology for detecting and resolving data and schema-level semantic conflicts , 2004, IEEE Transactions on Knowledge and Data Engineering.

[28]  Olivier Teste,et al.  Querying Heterogeneous Document Stores , 2018, ICEIS.

[29]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[30]  Ladjel Bellatreche,et al.  Semantic Data Warehouse Design: From ETL to Deployment à la Carte , 2013, DASFAA.

[31]  Kilian Stoffel,et al.  Ontology extraction from MongoDB using formal concept analysis , 2017, 2017 2nd International Conference on Knowledge Engineering and Applications (ICKEA).

[32]  Alberto Abelló,et al.  Automating multidimensional design from ontologies , 2007, DOLAP '07.

[33]  Klaus R. Dittrich,et al.  Three decades of data integration - All problems solved? , 2004, IFIP Congress Topical Sessions.

[34]  Boris Vrdoljak,et al.  Cromatcher: An Ontology Matching System Based on Automated Weighted Aggregation and Iterative Final Alignment , 2016, J. Web Semant..

[35]  Abhishek Sharma,et al.  Augmenting Data Warehouses with Big Data , 2015, Inf. Syst. Manag..

[36]  Faïez Gargouri,et al.  M2Onto: An Approach and a Tool to Learn OWL Ontology from MongoDB Database , 2016, ISDA.

[37]  Pedro Rosa,et al.  Moving from syntactic to semantic organizations using JXML2OWL , 2008, Comput. Ind..

[38]  R. Vijayakumar,et al.  Ontology based data integration of NoSQL datastores , 2014, 2014 9th International Conference on Industrial and Information Systems (ICIIS).

[39]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling , 2013 .

[40]  Farid Cerbah Learning Highly Structured Semantic Repositories from Relational Databases: , 2008, ESWC.

[41]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[42]  Sang-goo Lee,et al.  Data Engineering Issues in E-Commerce and Services: Second International Workshop, DEECS 2006, San Francisco, CA, USA, June 26, 2006 (Lecture Notes in Computer Science) , 2006 .

[43]  Jose-Norberto Mazón,et al.  Using Semantic Web technologies in the development of data warehouses: A systematic mapping , 2019, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[44]  Boris Vrdoljak,et al.  Integrating XML Sources into a Data Warehouse , 2006, DEECS.

[45]  Yarden Katz,et al.  Pellet: A practical OWL-DL reasoner , 2007, J. Web Semant..

[46]  Jérôme Euzenat,et al.  Similarity-Based Ontology Alignment in OWL-Lite , 2004, ECAI.

[47]  K. Vivekanandan,et al.  An Ontology based Hybrid Approach to Derive Multidimensional Schema for Data warehouse , 2012 .

[48]  Ágnes Vathy-Fogarassy,et al.  Uniform data access platform for SQL and NoSQL database systems , 2017, Inf. Syst..