A Metadata Framework for Data Lagoons

In this work, we present a Metadata Framework in the direction of extending intelligence mechanisms from the Cloud to the Edge. To this end, we build on our previously introduced notion of Data Lagoons—the analogous to Data Lakes at the network edge—and we introduce a novel architecture and Metadata model for the efficient interaction between Data Lagoons and Data Lakes. We identify the service and data planes of our architecture and we illustrate the application of our framework on a use case from the TPCx-IoT benchmark. To our knowledge, our approach is the first one to examine the integration of Data Lakes with Edge components, taking under consideration data and infrastructure resources of Edge Nodes.

[1]  Alexandra Roatis,et al.  CLAMS: Bringing Quality to Data Lakes , 2016, SIGMOD Conference.

[2]  R. Hooton,et al.  Seasonal influence on moisture interpretation for transformer aging assessment , 2016, IEEE Electrical Insulation Magazine.

[3]  Hassan H. Alrehamy,et al.  Personal Data Lake with Data Gravity Pull , 2015, 2015 IEEE Fifth International Conference on Big Data and Cloud Computing.

[4]  Ladjel Bellatreche,et al.  Value and Variety Driven Approach for Extended Data Warehouses Design , 2019 .

[5]  Matthias Jarke,et al.  On Warehouses, Lakes, and Spaces: The Changing Role of Conceptual Modeling for Data Integration , 2017, Conceptual Modeling Perspectives.

[6]  Teruo Higashino,et al.  Edge-centric Computing: Vision and Challenges , 2015, CCRV.

[7]  Riccardo Torlone,et al.  Crossing the finish line faster when paddling the Data Lake with Kayak , 2017, Proc. VLDB Endow..

[8]  Vasileios Theodorou,et al.  GLT: Edge Gateway ELT for Data-Driven Intelligence Placement , 2019, 2019 IEEE/ACM Joint 4th International Workshop on Rapid Continuous Software Engineering and 1st International Workshop on Data-Driven Decisions, Experimentation and Evolution (RCoSE/DDrEE).

[9]  Alberto Abelló,et al.  Incremental Consolidation of Data-Intensive Multi-Flows , 2016, IEEE Transactions on Knowledge and Data Engineering.

[10]  Christoph Quix,et al.  Metadata Extraction and Management in Data LakesWith GEMMS , 2016, Complex Syst. Informatics Model. Q..

[11]  Chris Douglas,et al.  Azure Data Lake Store: A Hyperscale Distributed File Service for Big Data Analytics , 2017, SIGMOD Conference.

[12]  Xinyu Yang,et al.  A Survey on Internet of Things: Architecture, Enabling Technologies, Security and Privacy, and Applications , 2017, IEEE Internet of Things Journal.

[13]  Wolfgang Lehner,et al.  Frequent patterns in ETL workflows: An empirical approach , 2017, Data Knowl. Eng..

[14]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[15]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[16]  Tilmann Rabl,et al.  Analysis of TPCx-IoT: The First Industry Standard Benchmark for IoT Gateway Systems , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[17]  Mary Roth,et al.  Data Wrangling: The Challenging Yourney from the Wild to the Lake , 2015, CIDR.

[18]  Mahadev Satyanarayanan,et al.  The Emergence of Edge Computing , 2017, Computer.

[19]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[20]  Felix Naumann,et al.  Data profiling revisited , 2014, SGMD.

[21]  Sandra Geisler,et al.  Constance: An Intelligent Data Lake System , 2016, SIGMOD Conference.