A novel learning method to classify data streams in the internet of things

Data streams are high volume of multi-dimensional unlabeled data generated in environments such as stock market, astronomical data, Weblogs, Click streams, Flood, Fire and Crops monitoring. Knowledge discovery in data streams is valuable task for research, business and community. The fundamental step of knowledge discovery in data stream is the classification of the data streams in target classes. In this paper we have proposed classification mechanism for the data streams, conventional classification algorithm are of little significance in data streams due to the complex nature, unbounded memory requirements and concept drifting problem in data streams. The proposed method takes a novel approach towards the classification of the data streams through applying unsupervised classification techniques such as clustering followed by supervised classifier such as Support Vector Machine. The high volume data is sampled and reduced with Simple Aggregation and Approximation (SAX) Density based clustering algorithm DB Scan is applied on the data stream to reveal the number of classes present and subsequently label the data. Support vector Machine (SVM) is a well-known and proven supervised classification algorithm, SVM are applied to classify the label data. We tested our proposed method on the Intel Lab Data set, a data set of four environmental variables (Temperature, Voltage, Humidity, light) collected through 54 Mica2Dot sensors over 36 Days at per second rate. We have sampled the data stream in days and window of certain size n is trained on the SVM classifier. The algorithm is evaluated on different test size and average accuracy of 80% is obtained.

[1]  Gerhard Tröster,et al.  Gestures are strings: efficient online gesture spotting and classification using string matching , 2007, BODYNETS.

[2]  Giandomenico Spezzano,et al.  FlockStream: A Bio-Inspired Algorithm for Clustering Evolving Data Streams , 2009, 2009 21st IEEE International Conference on Tools with Artificial Intelligence.

[3]  Michael A. Rodriguez,et al.  Clickstream Data Yields High-Resolution Maps of Science , 2009, PloS one.

[4]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[5]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[6]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[7]  Hong Chen,et al.  Novel Class Detection within Classification for Data Streams , 2013, ISNN.

[8]  Ke Shi,et al.  Data Mining Techniques for Wireless Sensor Networks: A Survey , 2013, Int. J. Distributed Sens. Networks.

[9]  Eamonn J. Keogh,et al.  Finding Motifs in a Database of Shapes , 2007, SDM.

[10]  Marimuthu Palaniswami,et al.  Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[11]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the McDiarmid's Bound , 2013, IEEE Transactions on Knowledge and Data Engineering.

[12]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[13]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[14]  Martin Colley,et al.  Feature Extraction from Sensor Data Streams for Real-Time Human Behaviour Recognition , 2007, PKDD.

[15]  Jiawei Han,et al.  On Appropriate Assumptions to Mine Data Streams: Analysis and Practice , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[16]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[17]  Kyoji Kawagoe,et al.  New Time Series Data Representation ESAX for Financial Applications , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  Charu C. Aggarwal,et al.  An Introduction to Data Streams , 2007, Data Streams - Models and Algorithms.

[20]  Zahoor Jan,et al.  Performance Analysis of Classifier Fusion Model with Minimum Feature Subset and Rotation of the Dataset , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[21]  Mahmoud Reza Hashemi,et al.  A DCT based approach for detecting novelty and concept drift in data streams , 2010, 2010 International Conference of Soft Computing and Pattern Recognition.

[22]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[23]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[24]  Charu C. Aggarwal,et al.  Mining Sensor Data Streams , 2013, Managing and Mining Sensor Data.

[25]  Eamonn J. Keogh,et al.  Finding Unusual Medical Time-Series Subsequences: Algorithms and Applications , 2006, IEEE Transactions on Information Technology in Biomedicine.

[26]  Giandomenico Spezzano,et al.  A single pass algorithm for clustering evolving data streams based on swarm intelligence , 2011, Data Mining and Knowledge Discovery.