A Framework for Evaluating Design Methodologies for Big Data Warehouses: Measurement of the Design Process

This article describes how the evaluation of modern data warehouses considers new solutions adopted for facing the radical changes caused by the necessity of reducing the storage volume, while increasing the velocity in multidimensional design and data elaboration, even in presence of unstructured data that are useful for providing qualitative information. The aim is to set up a framework for the evaluation of the physical and methodological characteristics of a data warehouse, realized by considering the factors that affect the data warehouse's lifecycle when taking into account the Big Data issues Volume, Velocity, Variety, Value, and Veracity. The contribution is the definition of a set of criteria for classifying Big Data Warehouses on the basis of their methodological characteristics. Based on these criteria, the authors defined a set of metrics for measuring the quality of Big Data Warehouses in reference to the design specifications. They show through a case study how the proposed metrics are able to check the eligibility of methodologies falling in different classes in the Big Data context.

[1]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[2]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[3]  Jose-Norberto Mazón,et al.  Reconciling requirement-driven data warehouses with data sources via multidimensional normal forms , 2007, Data Knowl. Eng..

[4]  Alexandros Labrinidis,et al.  Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..

[5]  Francesco Di Tria,et al.  Hybrid methodology for data warehouse conceptual design by UML schemas , 2012, Inf. Softw. Technol..

[6]  Max Chevalier,et al.  How Can We Implement a Multidimensional Data Warehouse Using NoSQL? , 2015, ICEIS.

[7]  Francesco Di Tria,et al.  Ontological Approach to Data Warehouse Source Integration , 2013, ISCIS.

[8]  Dan Suciu,et al.  Adding Structure to Unstructured Data , 1997, ICDT.

[9]  Stefano Ferilli,et al.  A General Similarity Framework for Horn Clause Logic , 2009, Fundam. Informaticae.

[10]  Alberto Abelló,et al.  A Survey of Multidimensional Modeling Methodologies , 2009, Int. J. Data Warehous. Min..

[11]  Michael Healy,et al.  Theory and Applications of Ontology: Computer Applications , 2010 .

[12]  K. Vivekanandan,et al.  A Tool for Data Warehouse Multidimensional Schema Design using Ontology , 2013 .

[13]  Nafees Ur Rehman,et al.  Building a Data Warehouse for Twitter Stream Exploration , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[14]  Mario Piattini,et al.  Empirical studies to assess the understandability of data warehouse schemas using structural metrics , 2008, Software Quality Journal.

[15]  Francesco Di Tria,et al.  Big Data Warehouse Automatic Design Methodology , 2014 .

[16]  Francesco Di Tria,et al.  Academic data warehouse design using a hybrid methodology , 2015, Comput. Sci. Inf. Syst..

[17]  John Elder,et al.  Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications , 2012 .

[18]  Francesco Di Tria,et al.  Cost-benefit analysis of data warehouse design methodologies , 2017, Inf. Syst..

[19]  Ladjel Bellatreche,et al.  DWOBS: Data Warehouse Design from Ontology-Based Sources , 2011, DASFAA.

[20]  Francesco Di Tria,et al.  Dimensional fact model extension via predicate calculus , 2009, 2009 24th International Symposium on Computer and Information Sciences.

[21]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[22]  Omar Boussaïd,et al.  Columnar NoSQL CUBE: Agregation operator for columnar NoSQL data warehouse , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[23]  Lawrence Corr,et al.  Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema , 2011 .

[24]  Mario Piattini,et al.  Metrics for Data Warehouse Quality , 2005, Encyclopedia of Information Science and Technology.

[25]  Francesco Di Tria,et al.  Logic Programming for Data Warehouse Conceptual Schema Validation , 2010, DaWak.

[26]  Richard D. Waters,et al.  Tweet, tweet, tweet: A content analysis of nonprofit organizations Twitter updates , 2011 .

[27]  Syed Mansoor Sarwar,et al.  Real-time data warehousing for business intelligence , 2010, FIT.

[28]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[29]  H. Vranesic,et al.  Ontology-based data warehouse development process , 2009, Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces.

[30]  Karen C. Davis,et al.  Automating data warehouse conceptual schema design and evaluation , 2002, DMDW.

[31]  Torben Bach Pedersen,et al.  SETL: A programmable semantic extract-transform-load framework for semantic data warehouses , 2017, Inf. Syst..

[32]  Lin He,et al.  An Ontology-Based Conceptual Modeling Method for Data Warehouse , 2011, 2011 International Conference of Information Technology, Computer Engineering and Management Sciences.

[33]  Jose-Norberto Mazón,et al.  A hybrid model driven development framework for the multidimensional modeling of data warehouses! , 2009, SGMD.

[34]  Francesco Di Tria,et al.  Evaluation of Data Warehouse Design Methodologies in the Context of Big Data , 2017, DaWaK.

[35]  Oguz Dikenelli,et al.  Extended Adaptive Join Operator with Bind-Bloom Join for Federated SPARQL Queries , 2017, Int. J. Data Warehous. Min..