Multivariate Stream Data Classification Using Simple Text Classifiers

We introduce a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes as input a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a simple text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Naive Bayes Model and SVM, and for unsupervised, we tested Jaccard, TFIDF, Jaro and JaroWinkler. In our experiments, SVM and TFIDF outperformed the other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

[1]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[2]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[3]  M. Gareis,et al.  Prevention of Mycotoxin Contamination of Meat and Meat Products , 1999 .

[4]  John Anderson,et al.  Wireless sensor networks for habitat monitoring , 2002, WSNA '02.

[5]  Pierre Geurts,et al.  Pattern Extraction for Time Series Classification , 2001, PKDD.

[6]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[7]  Xian-ping Ge,et al.  Pattern Matching in Financial Time Series Data , 1998 .

[8]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[9]  Byung-Won On,et al.  Comparative study of name disambiguation problem using a scalable blocking-based framework , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[10]  Bo Xu,et al.  Time-series prediction with applications to traffic and moving objects databases , 2003, MobiDe '03.

[11]  Claude Sammut,et al.  Classification of Multivariate Time Series and Structured Data Using Constructive Induction , 2005, Machine Learning.

[12]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[13]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[14]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[15]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[16]  R. Cardell-Oliver,et al.  Field testing a wireless sensor network for reactive environmental monitoring [soil moisture measurement] , 2004, Proceedings of the 2004 Intelligent Sensors, Sensor Networks and Information Processing Conference, 2004..