Multivariable stream data classification using motifs and their temporal relations

Multivariable stream data is becoming increasingly common as diverse types of sensor devices and networks are deployed. Building accurate classification models for such data has attracted a lot of attention from the research community. Most of the previous works, however, relied on features extracted from individual streams, and did not take into account the dependency relations among the features within and across the streams. In this work, we propose new classification models that exploit temporal relations among features. We showed that consideration of such dependencies does significantly improve the classification accuracy. Another benefit of employing temporal relations is the improved interpretability of the resulting classification models, as the set of temporal relations can be easily translated to a rule using a sequence of inter-dependent events characterizing the class. We evaluated the proposed scheme using different classification models including the Naive Bayesian, TFIDF, and vector distance models. We showed that the proposed model can be a useful addition to the set of existing stream classification algorithms.

[1]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[2]  Richard Nock,et al.  Mining evolving data streams for frequent patterns , 2007, Pattern Recognit..

[3]  Victoria S. Uren,et al.  Building and applying a concept hierarchy representation of a user profile , 2003, SIGIR.

[4]  Fabian Mörchen,et al.  Algorithms for time series knowledge mining , 2006, KDD '06.

[5]  Kuniaki Uehara,et al.  Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle , 2005, Machine Learning.

[6]  Byung-Won On,et al.  Comparative study of name disambiguation problem using a scalable blocking-based framework , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[7]  Edward A. Fox,et al.  Research Contributions , 2014 .

[8]  R. Larsen,et al.  An introduction to mathematical statistics and its applications (2nd edition) , by R. J. Larsen and M. L. Marx. Pp 630. £17·95. 1987. ISBN 13-487166-9 (Prentice-Hall) , 1987, The Mathematical Gazette.

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[11]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[12]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[15]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[16]  Claude Sammut,et al.  Classification of Multivariate Time Series and Structured Data Using Constructive Induction , 2005, Machine Learning.

[17]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[18]  Philip S. Yu,et al.  A framework for on-demand classification of evolving data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[19]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[20]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[21]  Charu C. Aggarwal,et al.  On change diagnosis in evolving data streams , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Yi-Chung Hu,et al.  Finding useful fuzzy concepts for pattern classification using genetic algorithm , 2005, Inf. Sci..

[23]  Abraham Kandel,et al.  Data Mining in Time Series Database , 2004 .

[24]  Philip S. Yu,et al.  A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.

[25]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[26]  Deborah R. Carvalho,et al.  A hybrid decision tree/genetic algorithm method for data mining , 2004, Inf. Sci..

[27]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[28]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[29]  Serkan Günal,et al.  Subspace based feature selection for pattern recognition , 2008, Inf. Sci..

[30]  Ruoming Jin,et al.  Efficient decision tree construction on streaming data , 2003, KDD '03.

[31]  Lior Rokach,et al.  Decision-tree instance-space decomposition with grouped gain-ratio , 2007, Inf. Sci..

[32]  Li-Ping Jing,et al.  Improved feature selection approach TFIDF in text mining , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.