CBS: A New Classification Method by Using Sequential Patterns

Data classification is an important topic in data mining field due to the wide applications. A number of related methods have been proposed based on the wellknown learning models like decision tree or neural network. However, these kinds of classification methods may not perform well in mining time sequence datasets like time-series gene expression data. In this paper, we propose a new data mining method, namely Classify-BySequence (CBS), for classifying large time-series datasets. The main methodology of CBS method is to integrate the sequential pattern mining with the probabilistic induction such that the inherent sequential patterns can be extracted efficiently and the classification task be done more accurately. Meanwhile, CBS method has the merit of simplicity in implementation. Through experimental evaluation, the CBS method is shown to outperform other methods greatly in the classification accuracy.