The SusCity Big Data Warehousing Approach for Smart Cities

Nowadays, the concept of Smart City provides a rich analytical context, highlighting the need to store and process vast amounts of heterogeneous data flowing at different velocities. This data is defined as Big Data, which imposes significant difficulties in traditional data techniques and technologies. Data Warehouses (DWs) have long been recognized as a fundamental enterprise asset, providing fact-based decision support for several organizations. The concept of DW is evolving. Traditionally, Relational Database Management Systems (RDBMSs) are used to store historical data, providing different analytical perspectives regarding several business processes. With the current advancements in Big Data techniques and technologies, the concept of Big Data Warehouse (BDW) emerges to surpass several limitations of traditional DWs. This paper presents a novel approach for designing and implementing BDWs, which has been supporting the SusCity data visualization platform. The BDW is a crucial component of the SusCity research project in the context of Smart Cities, supporting analytical tasks based on data collected in the city of Lisbon.

[1]  Adam Barker,et al.  Undefined By Data: A Survey of Big Data Definitions , 2013, ArXiv.

[2]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[3]  Yike Guo,et al.  High dimensional biological data retrieval optimization with NoSQL technology , 2014, BMC Genomics.

[4]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[5]  Martin Grund,et al.  Impala: A Modern, Open-Source SQL Engine for Hadoop , 2015, CIDR.

[6]  J. Chris Anderson,et al.  CouchDB: The Definitive Guide , 2010 .

[7]  Maribel Yasmina Santos,et al.  Improving Cities Sustainability through the Use of Data Mining in a Context of Big City Data , 2015 .

[8]  Ashiq Anjum,et al.  Cloud Based Big Data Analytics for Smart Future Cities , 2013, UCC.

[9]  Ying Dai,et al.  Gobblin: Unifying Data Ingestion for Hadoop , 2015, Proc. VLDB Endow..

[10]  Maribel Yasmina Santos,et al.  Data Models in NoSQL Databases for Big Data Contexts , 2016, DMBD.

[11]  Raymond Gardiner Goss,et al.  Heading towards big data building a better data warehouse for more data, more speed, and more users , 2013, ASMC 2013 SEMI Advanced Semiconductor Manufacturing Conference.

[12]  Xavier Vilajosana,et al.  Bootstrapping smart cities through a self-sustainable model based on big data flows , 2013, IEEE Communications Magazine.

[13]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[14]  Bin Cheng,et al.  Building a Big Data Platform for Smart Cities: Experience and Lessons from Santander , 2015, 2015 IEEE International Congress on Big Data.

[15]  Xiaofang Li,et al.  Real-Time data ETL framework for big real-time data analysis , 2015, 2015 IEEE International Conference on Information and Automation.

[16]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[17]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[18]  Shan Wang,et al.  LinearDB: A Relational Approach to Make Data Warehouse Scale Like MapReduce , 2011, DASFAA.

[19]  Maribel Yasmina Santos,et al.  Reinventing the Energy Bill in Smart Cities with NoSQL Technologies , 2016 .

[20]  Max Chevalier,et al.  Implementing Multidimensional Data Warehouses into NoSQL , 2015, ICEIS.

[21]  Dominique Genoud,et al.  Determining Human Dynamics through the Internet of Things , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[23]  Bernhard Mitschang,et al.  The Deep Data Warehouse: Link-Based Integration and Enrichment of Warehouse Data and Unstructured Content , 2014, 2014 IEEE 18th International Enterprise Distributed Object Computing Conference.

[24]  Ge Yu,et al.  HaoLap: A Hadoop based OLAP system for big data , 2015, J. Syst. Softw..

[25]  Soumendra Mohanty,et al.  Big Data Imperatives , 2013, Apress.

[26]  Yuan Yuan,et al.  Major technical advancements in apache hive , 2014, SIGMOD Conference.

[27]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[28]  Yogesh L. Simmhan,et al.  Cloud-Based Software Platform for Big Data Analytics in Smart Grids , 2013, Computing in Science & Engineering.

[29]  James G. Shanahan,et al.  Large Scale Distributed Data Science using Apache Spark , 2015, KDD.

[30]  Samuel Madden,et al.  From Databases to Big Data , 2012, IEEE Internet Comput..

[31]  Alan R. Hevner,et al.  Design Science in Information Systems Research , 2004, MIS Q..

[32]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling , 2013 .

[33]  Maribel Yasmina Santos,et al.  A Conceptual Model for the Professional Profile of a Data Scientist , 2017, WorldCIST.

[34]  Krish Krishnan,et al.  Data Warehousing Revisited , 2013 .

[35]  Max Chevalier,et al.  Document-oriented Models for Data Warehouses - NoSQL Document-oriented for Data Warehouses , 2016, ICEIS.

[36]  Theodore Johnson,et al.  Data stream warehousing , 2013, 2014 IEEE 30th International Conference on Data Engineering.

[37]  Samir Chatterjee,et al.  A Design Science Research Methodology for Information Systems Research , 2008 .

[38]  Maribel Yasmina Santos,et al.  Data Warehousing in Big Data: From Multidimensional to Tabular Data Models , 2016, C3S2E.

[39]  Krish Krishnan,et al.  Data Warehousing in the Age of Big Data , 2013 .

[40]  Soumendra Mohanty,et al.  Big Data Imperatives: Enterprise Big Data Warehouse, BI Implementations and Analytics , 2013 .

[41]  Umar Farooq Minhas,et al.  SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures , 2014, Proc. VLDB Endow..

[42]  S. Rus,et al.  Kudu : Storage for Fast Analytics on Fast Data ∗ , 2016 .

[43]  Maribel Yasmina Santos,et al.  BASIS: A big data architecture for smart cities , 2016, 2016 SAI Computing Conference (SAI).

[44]  Qing Zhu,et al.  Efficient query processing framework for big data warehouse: an almost join-free approach , 2014, Frontiers of Computer Science.

[45]  Anna Fensel,et al.  Big Data in Large Scale Intelligent Smart City Installations , 2013, IIWAS '13.

[46]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[47]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.