A Bottom-Up Tree Based Storage Approach for Efficient IoT Data Analytics in Cloud Systems

Internet of Things (IoT) has been widely applied in various domains, e.g. environmental monitoring, intelligent transport system, video surveillance, etc. In most of the IoT applications, the IoT data is generated from a number of data sources, not just only one source. In addition, IoT data has various types with different processing requirements. The high-priority IoT data should have better storage and processing manners than the low-priority IoT data. The objective of this paper is to propose an efficient cloud storage approach for considering the multi-aspect requirements of IoT data. In the approach, a light-weight data structure is used to depict the distribution and calculate the size of each IoT subset (type) in all data sources. Then, we form a number of storage-locality groups from cloud storage blocks. However, the storage-locality groups have different storage sizes and locality capabilities. We would like to place the high-priority IoT subset in the storage-locality group with a strong locality capability. Therefore, there is the placement-combinational problem between IoT subsets and the storage-locality groups. To efficiently solve the IoT placement problem, we propose a bottom-up tree based approach associated with the solution of the well-known combinatorial problem: knapsack. Considering the knapsack problem with the NP-hard computational complexity, we also propose a heuristic placement approach.

[1]  J. Morris Chang,et al.  QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing Systems , 2013, IEEE Transactions on Cloud Computing.

[2]  Yang Liu,et al.  A Storage Solution for Massive IoT Data Based on NoSQL , 2012, 2012 IEEE International Conference on Green Computing and Communications.

[3]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[4]  Bengt Ahlgren,et al.  Internet of Things for Smart Cities: Interoperability and Open Data , 2016, IEEE Internet Computing.

[5]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[6]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[7]  Zhen Xiao,et al.  LIBRA: Lightweight Data Skew Mitigation in MapReduce , 2015, IEEE Transactions on Parallel and Distributed Systems.

[8]  Jian Yu,et al.  EdgeCNN: A Hybrid Architecture for Agile Learning of Healthcare Data from IoT Devices , 2018, 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS).

[9]  Lixin Gao,et al.  A Scalable Distributed Framework for Efficient Analytics on Ordered Datasets , 2013, 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing.

[10]  Thar Baker,et al.  Cloud-Based Multi-Agent Cooperation for IoT Devices Using Workflow-Nets , 2019, Journal of Grid Computing.

[11]  Michael J. Carey,et al.  Extending Map-Reduce for Efficient Predicate-Based Sampling , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[12]  Ewa Deelman,et al.  Storage-aware Algorithms for Scheduling of Workflow Ensembles in Clouds , 2015, Journal of Grid Computing.

[13]  Wei Yu,et al.  Smart city: The state of the art, datasets, and evaluation platforms , 2017, 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS).

[14]  Mao-Lun Chiang,et al.  The Incremental Load Balance Cloud Algorithm by Using Dynamic Data Deployment , 2019, Journal of Grid Computing.

[15]  Moayad Aloqaily,et al.  EdgeKV: Decentralized, scalable, and consistent storage for the edge , 2020, J. Parallel Distributed Comput..

[16]  Pangfeng Liu,et al.  Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[17]  Umesh Bellur,et al.  Uploading and Replicating Internet of Things (IoT) Data on Distributed Cloud Storage , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[18]  Yue Yin,et al.  Deep Learning-Based Unmanned Surveillance Systems for Observing Water Levels , 2018, IEEE Access.

[19]  Ibrar Yaqoob,et al.  Big IoT Data Analytics: Architecture, Opportunities, and Open Research Challenges , 2017, IEEE Access.

[20]  Athanasios V. Vasilakos,et al.  IoT-Based Big Data Storage Systems in Cloud Computing: Perspectives and Challenges , 2017, IEEE Internet of Things Journal.

[21]  Dan Wu,et al.  A Bloom Filter-Based Approach for Efficient Mapreduce Query Processing on Ordered Datasets , 2013, 2013 International Conference on Advanced Cloud and Big Data.

[22]  Huafeng Wu,et al.  Speed Up Big Data Analytics by Unveiling the Storage Distribution of Sub-Datasets , 2018, IEEE Transactions on Big Data.

[23]  John A. Stankovic,et al.  Research Directions for the Internet of Things , 2014, IEEE Internet of Things Journal.

[24]  Tushar A. Champaneria,et al.  Survey of various data collection ways for smart transportation domain of smart city , 2017, 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC).

[25]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[26]  Hans Kellerer,et al.  Knapsack problems , 2004 .

[27]  Diganta Goswami,et al.  NS3 Simulator for a Study of Data Center Networks , 2013, 2013 IEEE 12th International Symposium on Parallel and Distributed Computing.

[28]  James C. French,et al.  Content Locality in Distributed Digital Libraries , 1999, Inf. Process. Manag..

[29]  Magdalena Balazinska,et al.  SkewTune: mitigating skew in mapreduce applications , 2012, SIGMOD Conference.