Dependable large scale behavioral patterns mining from sensor data using Hadoop platform

Abstract Wireless sensor networks (WSNs) will be an integral part of the future Internet of Things (IoT) environment and generate large volumes of data. However, these data would only be of benefit if useful knowledge can be mined from them. A data mining framework for WSNs includes data extraction, storage and mining techniques, and must be efficient and dependable. In this paper, we propose a new type of behavioral pattern mining technique from sensor data called regularly frequent sensor patterns (RFSPs). RFSPs can identify a set of temporally correlated sensors which can reveal significant knowledge from the monitored data. A distributed data extraction model to prepare the data required for mining RFSPs is proposed, as the distributed scheme ensures higher availability through greater redundancy. The tree structure for RFSP is compact requires less memory and can be constructed using only a single scan through the dataset, and the mining technique is efficient with low runtime. Current mining techniques in the literature on sensor data employ a single memory-based sequential approach and hence are not efficient. Moreover, usage of the MapReduce model for the distributed solution has not been explored extensively. Since MapReduce is becoming the de facto model for computation on large data, we also propose a parallel implementation of the RFSP mining algorithm, called RFSP on Hadoop (RFSP-H), which uses a MapReduce-based framework to gain further efficiency. Experiments conducted to evaluate the compactness and performance of the data extraction model, RFSP-tree and RFSP-H mining show improved results.

[1]  Sabeur Aridhi,et al.  A novel MapReduce-based approach for distributed frequent subgraph mining ∗ , 2013 .

[2]  Mohammad Al Hasan,et al.  An Iterative MapReduce Based Frequent Subgraph Mining Algorithm , 2013, IEEE Transactions on Knowledge and Data Engineering.

[3]  Ho-Jin Choi,et al.  Efficient Mining Regularly Frequent Patterns in Transactional Databases , 2012, DASFAA.

[4]  Meikang Qiu,et al.  A decentralized approach for mining event correlations in distributed system monitoring , 2013, J. Parallel Distributed Comput..

[5]  Ke Shi,et al.  Data Mining Techniques for Wireless Sensor Networks: A Survey , 2013, Int. J. Distributed Sens. Networks.

[6]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7]  Michelangelo Ceci,et al.  A parallel, distributed algorithm for relational frequent pattern discovery from very large data sets , 2011, Intell. Data Anal..

[8]  Iqbal Gondal,et al.  Mining Associated Patterns from Wireless Sensor Networks , 2015, IEEE Transactions on Computers.

[9]  Jie Wu,et al.  Sensor Placement with Multiple Objectives for Structural Health Monitoring in WSNs , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[10]  Kay Römer,et al.  Distributed Mining of Spatio-Temporal Event Patterns in Sensor Networks , 2007 .

[11]  Peter Desnoyers,et al.  PRESTO: A Predictive Storage Architecture for Sensor Networks , 2005, HotOS.

[12]  Chih-Ping Chu,et al.  Determining the appropriate number of nodes for fast mining of frequent patterns in distributed computing environments , 2015, Int. J. Parallel Emergent Distributed Syst..

[13]  Wendi Heinzelman,et al.  Energy-efficient communication protocol for wireless microsensor networks , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[14]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[15]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[16]  Eli Upfal,et al.  PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce , 2012, CIKM.

[17]  Azzedine Boukerche,et al.  A Novel Algorithm for Mining Association Rules in Wireless Ad Hoc Sensor Networks , 2008, IEEE Transactions on Parallel and Distributed Systems.

[18]  Azzedine Boukerche,et al.  Target Association Rules: A New Behavioral Patterns for Point of Coverage Wireless Sensor Networks , 2011, IEEE Transactions on Computers.

[19]  Young-Koo Lee,et al.  Efficient single-pass frequent pattern mining using a prefix-tree , 2009, Inf. Sci..

[20]  Jiayi Zhou,et al.  Load Balancing Approach Parallel Algorithm for Frequent Pattern Mining , 2007, PaCT.

[21]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[22]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[23]  Peter Desnoyers,et al.  Ultra-low power data storage for sensor networks , 2009, TOSN.

[24]  Chris Clifton,et al.  Dependable real-time data mining , 2005, Eighth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'05).

[25]  Iqbal Gondal,et al.  Regularly Frequent Patterns Mining from Sensor Data Stream , 2013, ICONIP.

[26]  Walid G. Aref,et al.  Periodicity detection in time series databases , 2005, IEEE Transactions on Knowledge and Data Engineering.

[27]  Carson Kai-Sang Leung,et al.  Mining Frequent Patterns from Uncertain Data with MapReduce for Big Data Analytics , 2013, DASFAA.

[28]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[29]  Byeong-Soo Jeong,et al.  An Efficient Single-Pass Algorithm for Mining Association Rules from Wireless Sensor Networks , 2009 .

[30]  Iqbal Gondal,et al.  Share-Frequent Sensor Patterns Mining from Wireless Sensor Network Data , 2015, IEEE Transactions on Parallel and Distributed Systems.

[31]  Young-Koo Lee,et al.  Discovering Periodic-Frequent Patterns in Transactional Databases , 2009, PAKDD.

[32]  Young-Koo Lee,et al.  RP-Tree: A Tree Structure to Discover Regular Patterns in Transactional Database , 2008, IDEAL.

[33]  Ming-Yen Lin,et al.  Apriori-based frequent itemset mining algorithms on MapReduce , 2012, ICUIMC.