Enhancing Big Data Warehousing for Efficient, Integrated and Advanced Analytics - Visionary Paper

The existing capacity to collect, store, process and analyze huge amounts of data that is rapidly generated, i.e., Big Data, is characterized by fast technological developments and by a limited set of conceptual advances that guide researchers and practitioners in the implementation of Big Data systems. New data stores or processing tools frequently appear, proposing new (and usually more efficient) ways to store and query data (like SQL-on-Hadoop). Although very relevant, the lack of common methodological guidelines or practices has motivated the implementation of Big Data systems based on use-case driven approaches. This is also the case for one of the most valuable organizational data assets, the Data Warehouse, which needs to be rethought in the way it is designed, modeled, implemented, managed and monitored. This paper addresses some of the research challenges in Big Data Warehousing systems, proposing a vision that looks into: (i) the integration of new business processes and data sources; (ii) the proper way to achieve this integration; (iii) the management of these complex data systems and the enhancement of their performance; (iv) the automation of some of their analytical capabilities with Complex Event Processing and Machine Learning; and, (v) the flexible and highly customizable visualization of their data, providing an advanced decision-making support environment.

[1]  Mário Rodrigues,et al.  Big data processing tools: An experimental performance evaluation , 2018, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[2]  Maribel Yasmina Santos,et al.  Big Data: state-of-the-art concepts, techniques, technologies, modeling approaches and research challenges , 2017 .

[3]  Francesco Di Tria,et al.  A Framework for Evaluating Design Methodologies for Big Data Warehouses: Measurement of the Design Process , 2018, Int. J. Data Warehous. Min..

[4]  Minos N. Garofalakis,et al.  Issues in complex event processing: Status and prospects in the Big Data era , 2017, J. Syst. Softw..

[5]  Krish Krishnan,et al.  Data Warehousing in the Age of Big Data , 2013 .

[6]  Maribel Yasmina Santos,et al.  Evaluating Several Design Patterns and Trends in Big Data Warehousing Systems , 2018, CAiSE.

[7]  Bernhard Mitschang,et al.  The Deep Data Warehouse: Link-Based Integration and Enrichment of Warehouse Data and Unstructured Content , 2014, 2014 IEEE 18th International Enterprise Distributed Object Computing Conference.

[8]  Ge Yu,et al.  HaoLap: A Hadoop based OLAP system for big data , 2015, J. Syst. Softw..

[9]  Samee U. Khan,et al.  QuantCloud: Enabling Big Data Complex Event Processing for Quantitative Finance Through a Data-Driven Execution , 2019, IEEE Transactions on Big Data.

[10]  Max Chevalier,et al.  Document-oriented Models for Data Warehouses - NoSQL Document-oriented for Data Warehouses , 2016, ICEIS.

[11]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[12]  Maribel Yasmina Santos,et al.  Efficient Big Data Modelling and Organization for Hadoop Hive-Based Data Warehouses , 2017, EMCIS.

[13]  Zhaogong Zhang,et al.  Integration of Big Data: A Survey , 2018, ICPCSEE.

[14]  Oscar Pastor,et al.  Smart Data for Genomic Information Systems: the SILE Method , 2018, Complex Syst. Informatics Model. Q..

[15]  Maribel Yasmina Santos,et al.  Big Data Warehouses for Smart Industries , 2019, Encyclopedia of Big Data Technologies.

[16]  Edd Dumbill,et al.  Making Sense of Big Data , 2013, Big Data.

[17]  Maribel Yasmina Santos,et al.  A Big Data system supporting Bosch Braga Industry 4.0 strategy , 2017, Int. J. Inf. Manag..

[18]  Maribel Yasmina Santos,et al.  The SusCity Big Data Warehousing Approach for Smart Cities , 2017, IDEAS.

[19]  Yuan Yuan,et al.  Major technical advancements in apache hive , 2014, SIGMOD Conference.

[20]  Qing Zhu,et al.  Efficient query processing framework for big data warehouse: an almost join-free approach , 2014, Frontiers of Computer Science.

[21]  Samuel Madden,et al.  From Databases to Big Data , 2012, IEEE Internet Comput..

[22]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[23]  Oscar Pastor,et al.  A Method to Identify Relevant Genome Data: Conceptual Modeling for the Medicine of Precision , 2018, ER.

[24]  Oscar Pastor,et al.  Defining Interaction Design Patterns to Extract Knowledge from Big Data , 2018, CAiSE.

[25]  Riccardo Torlone,et al.  KAYAK: A Framework for Just-in-Time Data Preparation in a Data Lake , 2018, CAiSE.

[26]  Maribel Yasmina Santos,et al.  Evaluating SQL-on-Hadoop for Big Data Warehousing on Not-So-Good Hardware , 2017, IDEAS.

[27]  Xiaofang Li,et al.  Real-Time data ETL framework for big real-time data analysis , 2015, 2015 IEEE International Conference on Information and Automation.

[28]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[29]  Nikos Bikakis,et al.  Big Data Visualization Tools , 2018, Encyclopedia of Big Data Technologies.

[30]  Martin Grund,et al.  Impala: A Modern, Open-Source SQL Engine for Hadoop , 2015, CIDR.

[31]  Max Chevalier,et al.  Implementing Multidimensional Data Warehouses into NoSQL , 2015, ICEIS.

[32]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling , 2013 .

[33]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[34]  Minos N. Garofalakis,et al.  FERARI: A Prototype for Complex Event Processing over Streaming Multi-cloud Platforms , 2016, SIGMOD Conference.