Evaluating Several Design Patterns and Trends in Big Data Warehousing Systems

The Big Data characteristics, namely volume, variety and velocity, currently highlight the severe limitations of traditional Data Warehouses (DWs). Their strict relational model, costly scalability, and, sometimes, inefficient performance open the way for emerging techniques and technologies. Recently, the concept of Big Data Warehousing is gaining attraction, aiming to study and propose new ways of dealing with the Big Data challenges in Data Warehousing contexts. The Big Data Warehouse (BDW) can be seen as a flexible, scalable and highly performant system that uses Big Data techniques and technologies to support mixed and complex analytical workloads (e.g., streaming analysis, ad hoc querying, data visualization, data mining, simulations) in several emerging contexts like Smart Cities and Industries 4.0. However, due to the almost embryonic state of this topic, the ambiguity of the constructs and the lack of common approaches still prevails. In this paper, we discuss and evaluate some design patterns and trends in Big Data Warehousing systems, including data modelling techniques (e.g., star schemas, flat tables, nested structures) and some streaming considerations for BDWs (e.g., Hive vs. NoSQL databases), aiming to foster and align future research, and to help practitioners in this area.

[1]  Soumendra Mohanty,et al.  Big Data Imperatives , 2013, Apress.

[2]  Krish Krishnan,et al.  Data Warehousing in the Age of Big Data , 2013 .

[3]  Theodore Johnson,et al.  Data stream warehousing , 2014, ICDE.

[4]  Jun Wang,et al.  Improving metadata management for small files in HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[5]  Yuan Yuan,et al.  Major technical advancements in apache hive , 2014, SIGMOD Conference.

[6]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[7]  Maribel Yasmina Santos,et al.  A Big Data system supporting Bosch Braga Industry 4.0 strategy , 2017, Int. J. Inf. Manag..

[8]  Samir Chatterjee,et al.  A Design Science Research Methodology for Information Systems Research , 2008 .

[9]  Samuel Madden,et al.  From Databases to Big Data , 2012, IEEE Internet Comput..

[10]  Maribel Yasmina Santos,et al.  The SusCity Big Data Warehousing Approach for Smart Cities , 2017, IDEAS.

[11]  M. Vijayalakshmi,et al.  Big Data analytics frameworks , 2014, 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA).

[12]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[13]  Max Chevalier,et al.  How Can We Implement a Multidimensional Data Warehouse Using NoSQL? , 2015, ICEIS.

[14]  Umar Farooq Minhas,et al.  SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures , 2014, Proc. VLDB Endow..

[15]  Adam Barker,et al.  Undefined By Data: A Survey of Big Data Definitions , 2013, ArXiv.

[16]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[17]  Yike Guo,et al.  High dimensional biological data retrieval optimization with NoSQL technology , 2014, BMC Genomics.

[18]  Qing Zhu,et al.  Efficient query processing framework for big data warehouse: an almost join-free approach , 2014, Frontiers of Computer Science.

[19]  Bernhard Mitschang,et al.  The Deep Data Warehouse: Link-Based Integration and Enrichment of Warehouse Data and Unstructured Content , 2014, 2014 IEEE 18th International Enterprise Distributed Object Computing Conference.

[20]  Ge Yu,et al.  HaoLap: A Hadoop based OLAP system for big data , 2015, J. Syst. Softw..

[21]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[22]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[23]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[24]  Mu Hu,et al.  Present Situation and Prospect of Data Warehouse Architecture under the Background of Big Data , 2013, 2013 International Conference on Information Science and Cloud Computing Companion.

[25]  Soumendra Mohanty,et al.  Big Data Imperatives: Enterprise Big Data Warehouse, BI Implementations and Analytics , 2013 .

[26]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[27]  Maribel Yasmina Santos,et al.  Efficient Big Data Modelling and Organization for Hadoop Hive-Based Data Warehouses , 2017, EMCIS.

[28]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling , 2013 .

[29]  Max Chevalier,et al.  Implementing Multidimensional Data Warehouses into NoSQL , 2015, ICEIS.

[30]  Raymond Gardiner Goss,et al.  Heading towards big data building a better data warehouse for more data, more speed, and more users , 2013, ASMC 2013 SEMI Advanced Semiconductor Manufacturing Conference.

[31]  Maribel Yasmina Santos,et al.  Evaluating SQL-on-Hadoop for Big Data Warehousing on Not-So-Good Hardware , 2017, IDEAS.

[32]  Xiaofang Li,et al.  Real-Time data ETL framework for big real-time data analysis , 2015, 2015 IEEE International Conference on Information and Automation.