Extracting OLAP Cubes From Document-Oriented NoSQL Database Based on Parallel Similarity Algorithms

Today, the relational database is not suitable for data management due to the large variety and volume of data which are mostly untrusted. Therefore, NoSQL has attracted the attention of companies. Despite it being a proper choice for managing a variety of large volume data, there is a big challenge and difficulty in performing online analytical processing (OLAP) on NoSQL since it is schema-less. This article aims to introduce a model to overcome null value in converting document-oriented NoSQL databases into relational databases using parallel similarity techniques. The proposed model includes four phases, shingling, chunck, minhashing, and locality-sensitive hashing MapReduce (LSHMR). Each phase performs a proper process on input NoSQL databases. The main idea of LSHMR is based on the nature of both locality-sensitive hashing (LSH) and MapReduce (MR). In this article, the LSH similarity search technique is used on the MR framework to extract OLAP cubes. LSH is used to decrease the number of comparisons. Furthermore, MR enables efficient distributed and parallel computing. The proposed model is an efficient and suitable approach for extracting OLAP cubes from an NoSQL database.

[1]  Ashish Goel,et al.  Efficient distributed locality sensitive hashing , 2012, CIKM.

[2]  Guy Fouché,et al.  Introduction to OLAP , 2010 .

[3]  Rinkle Rani,et al.  Modeling and querying data in NoSQL databases , 2013, 2013 IEEE International Conference on Big Data.

[4]  Khaled Dehdouh Building OLAP Cubes from Columnar NoSQL Data Warehouses , 2016, MEDI.

[5]  Erkay Savas,et al.  Efficient top-k similarity document search utilizing distributed file systems and cosine similarity , 2015, Cluster Computing.

[6]  Kyuseok Shim,et al.  Supporting set-valued joins in NoSQL using MapReduce , 2015, Inf. Syst..

[7]  Ágnes Vathy-Fogarassy,et al.  Uniform data access platform for SQL and NoSQL database systems , 2017, Inf. Syst..

[8]  Sang Ho Lee,et al.  Bucket-size balancing locality sensitive hashing using the map reduce paradigm , 2017, Cluster Computing.

[9]  Peng Wang,et al.  An efficient MapReduce algorithm for similarity join in metric spaces , 2016, The Journal of Supercomputing.

[10]  Mohammad Karim Sohrabi,et al.  Parallel set similarity join on big data based on Locality-Sensitive Hashing , 2017, Sci. Comput. Program..

[11]  Max Chevalier,et al.  How Can We Implement a Multidimensional Data Warehouse Using NoSQL? , 2015, ICEIS.

[12]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[13]  Steve Wright,et al.  Business Intelligence Basics , 2010 .

[14]  Muhammad Younas,et al.  Testing of transactional services in NoSQL key-value databases , 2018, Future Gener. Comput. Syst..

[15]  Junzhou Luo,et al.  A MapReduceMerge-based Data Cube Construction Method , 2010, 2010 Ninth International Conference on Grid and Cloud Computing.

[16]  Max Chevalier,et al.  Implementation of Multidimensional Databases with Document-Oriented NoSQL , 2015, DaWaK.

[17]  Rubby Casallas,et al.  Data schema does matter, even in NoSQL systems! , 2016, 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS).

[18]  Chen Li,et al.  Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[19]  Bhushan Lakhe Re-Architecting for NoSQL: Design Principles, Models and Best Practices , 2016 .

[20]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near-duplicate detection , 2011, TODS.

[21]  Ivan Kovacevic,et al.  Alternative business intelligence engines , 2017, 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[22]  Paolo Atzeni,et al.  Data modeling in the NoSQL world , 2016, Comput. Stand. Interfaces.

[23]  Ganesh Chandra Deka BASE analysis of NoSQL database , 2015, Future Gener. Comput. Syst..

[24]  Bouakkaz Mustapha,et al.  Automatic textual aggregation approach of scientific articles in OLAP context , 2014, 2014 10th International Conference on Innovations in Information Technology (IIT).

[25]  Nafees Ur Rehman,et al.  Discovering OLAP dimensions in semi-structured data , 2012, DOLAP '12.

[26]  Anthony K. H. Tung,et al.  Similarity search: a matching based approach , 2006, VLDB.

[27]  Fatma Abdelhédi,et al.  UMLtoNoSQL: Automatic Transformation of Conceptual Schema to NoSQL Databases , 2017, 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA).

[28]  Laura Schweitzer,et al.  Database Systems A Practical Approach To Design Implementation And Management , 2016 .

[29]  Ge Yu,et al.  HaoLap: A Hadoop based OLAP system for big data , 2015, J. Syst. Softw..

[30]  Max Chevalier,et al.  Document-Oriented Data Warehouses: Complex Hierarchies and Summarizability , 2016, UNet.

[31]  Ralph Deters,et al.  Unstructured Data, NoSQL, and Terms Analytics , 2016 .

[32]  Sunil Prabhakar,et al.  Fast similarity join for multi-dimensional data , 2007, Inf. Syst..

[33]  Theo Härder,et al.  Generalizing prefix filtering to improve set similarity joins , 2011, Inf. Syst..

[34]  Zurinahni Zainol,et al.  Document-Oriented Data Schema for Relational Database Migration to NoSQL , 2017, 2017 International Conference on Big Data Innovations and Applications (Innovate-Data).

[35]  Max Chevalier,et al.  Implementation of Multidimensional Databases in Column-Oriented NoSQL Systems , 2015, ADBIS.

[36]  Jiangtao Cui,et al.  Efficient indexing of binary LSH for high dimensional nearest neighbor , 2016, Neurocomputing.

[37]  Zane Bicevska,et al.  Towards NoSQL-based Data Warehouse Solutions , 2017 .

[38]  Ying Xie,et al.  Massive Data Analysis: Tasks, Tools, Applications, and Challenges , 2016 .

[39]  Koshy George,et al.  Big database stores a review on various big data datastores , 2015, 2015 International Conference on Green Computing and Internet of Things (ICGCIoT).

[40]  M. Amparo Vila,et al.  Building a contextual dimension for OLAP using textual data from social networks , 2018, Expert Syst. Appl..

[41]  Radoslaw Szmit Locality Sensitive Hashing for Similarity Search Using MapReduce on Large Scale Data , 2013, IIS.