IoT streaming data integration from multiple sources

The Internet of Things (IoT) has recently received considerable interest due to the development of smart technologies in today’s interconnected world. With the rapid advancement in Internet technologies and the proliferation of IoT sensors, myriad systems and applications generate data of a massive volume, variety and velocity which traditional databases and systems are unable to manage effectively. Many organizations need to deal with these massive datasets that encounter different types of data (e.g., IoT streaming data, static data) in different formats (e.g., structured, semi-structured) coming from multiple sources. Several data integration mechanisms have been designed to process mostly static data. Unfortunately, these techniques are not able to deal with and integrate IoT streaming datasets from multiple sources. In this paper, we identify the challenges of IoT Streaming Data Integration (ISDI) and present a formal approach for the real-time integration of such IoT streaming datasets. We address one of the important issues of timing conflict/alignment among streaming data coming from multiple sources. A generic window-based ISDI approach is proposed to deal with IoT data in different formats and algorithms are developed to integrate IoT streaming data from multiple sources. In particular, we extend the basic windowing algorithm for real-time data integration and to deal with the timing alignment issue. We also introduce a de-duplication algorithm to deal with data redundancy and to demonstrate the useful fragments of the integrated data. We conduct several sets of experiments and quantify the performance of our proposed window-based approach. In particular, we compare our local experimental results with a real setup for streaming data, using Apache Spark. The results of the experiments, which are performed on several IoT datasets, show the efficiency of our proposed solution in terms of processing time. The results are also used to provide an integrated data view to the users.

[1]  Gianluigi Ferrari,et al.  A Scalable Big Stream Cloud Architecture for the Internet of Things , 2015, Int. J. Syst. Serv. Oriented Eng..

[2]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[3]  Mohd Abdul Ahad,et al.  Dynamic Merging based Small File Storage (DM-SFS) Architecture for Efficiently Storing Small Size Files in Hadoop , 2018 .

[4]  Tharam S. Dillon,et al.  Context-aware access control with imprecise context characterization for cloud-based data resources , 2019, Future Gener. Comput. Syst..

[5]  J. Wenny Rahayu,et al.  A Policy Model and Framework for Context-Aware Access Control to Information Resources , 2017, ArXiv.

[6]  Xu Han,et al.  An efficient index for massive IOT data in cloud environment , 2012, CIKM '12.

[7]  Johannes Fürnkranz,et al.  Integrative Windowing , 1998, J. Artif. Intell. Res..

[8]  Manish Agarwal,et al.  Striim: A streaming analytics platform for real-time business decisions , 2017, BIRTE.

[9]  A. S. M. Kayes,et al.  Integration of IoT Streaming Data With Efficient Indexing and Storage Optimization , 2020, IEEE Access.

[10]  Alok Pareek,et al.  Real-time ETL in Striim , 2018, BIRTE.

[11]  Veda C. Storey,et al.  Big data technologies and Management: What conceptual modeling can do , 2017, Data Knowl. Eng..

[12]  Renée J. Miller,et al.  Framework for Evaluating Clustering Algorithms in Duplicate Detection , 2009, Proc. VLDB Endow..

[13]  Quanzhong Li,et al.  Skyline index for time series data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Maurizio Lenzerini,et al.  Data integration for research and innovation policy: an Ontology-Based Data Management approach , 2015, Scientometrics.

[15]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[16]  Md. Saiful Islam,et al.  RelBOSS: A Relationship-Aware Access Control Framework for Software Services , 2014, OTM Conferences.

[17]  E. Srinivasa Reddy,et al.  Dimension Reduction and Storage Optimization Techniques for Distributed and Big Data Cluster Environment , 2018, Soft Computing and Medical Bioinformatics.

[18]  Avigdor Gal,et al.  Multi-source uncertain entity resolution: Transforming holocaust victim reports into people , 2017, Inf. Syst..

[19]  J. Wenny Rahayu,et al.  Context-Aware Access Control with Imprecise Context Characterization Through a Combined Fuzzy Logic and Ontology-Based Approach , 2017, OTM Conferences.

[20]  Vijay V. Raghavan,et al.  Big Data: Promises and Problems , 2015, Computer.

[21]  A. S. M. Kayes,et al.  ISDI: A New Window-Based Framework for Integrating IoT Streaming Data from Multiple Sources , 2019, AINA.

[22]  Pedro M. Domingos,et al.  Learning Source Description for Data Integration , 2000, WebDB.

[23]  Giancarlo Fortino,et al.  A hybrid deep learning model for efficient intrusion detection in big data environment , 2020, Inf. Sci..

[24]  Xiaoyong Du,et al.  Big data challenge: a data management perspective , 2013, Frontiers of Computer Science.

[25]  Giancarlo Fortino,et al.  Data Mining at the IoT Edge , 2019, 2019 28th International Conference on Computer Communication and Networks (ICCCN).

[26]  Jun Han,et al.  ICAF: A Context-Aware Framework for Access Control , 2012, ACISP.

[27]  Erhard Rahm,et al.  Schema Matching and Mapping , 2013, Schema Matching and Mapping.

[28]  Andrew Borthwick,et al.  Dynamic Record Blocking: Efficient Linking of Massive Databases in MapReduce , 2012 .

[29]  Zhengping Qian,et al.  TimeStream: reliable stream computation in the cloud , 2013, EuroSys '13.

[30]  SongIl-Yeol,et al.  Big data technologies and Management , 2017 .

[31]  Divesh Srivastava,et al.  Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[32]  Alasdair J. G. Gray,et al.  Enabling Ontology-Based Access to Streaming Data Sources , 2010, SEMWEB.

[33]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.

[34]  J. Wenny Rahayu,et al.  Accessing Data from Multiple Sources Through Context-Aware Access Control , 2018, 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE).

[35]  Alessandro Margara,et al.  Processing flows of information: From data stream to complex event processing , 2012, CSUR.

[36]  Maurizio Lenzerini,et al.  The advantages of an Ontology-Based Data Management approach: openness, interoperability and data quality , 2016, Scientometrics.

[37]  Alon Y. Halevy,et al.  Uncertainty in Data Integration and Dataspace Support Platforms , 2011, Schema Matching and Mapping.