An Efficient Ensemble Sequence Classifier

The techniques of classification are through learning historical data to help people to predict the class label of data, and they have been applied to solve many problems. In the real world, there exists many sequence data, such as genome sequences, those should be learned and analyzed for predicting class labels. The traditional classification methods are unsuitable for sequence data. This study proposes an Ensemble Sequence Classifier (ESC). The ESC consists of two stages. The first stage generates a Sequence Classifier based on Pattern Coverage Rate (SC-PCR) in two phases. The first phase mines sequential patterns and builds the features of each class, whereas the second phase classifies sequences based on class scores using a pattern coverage rate. The second stage creates an ensemble classifier by some classifiers built from the first stage, to improve the prediction accuracy. The experimental results confirm that the SC-PCR and ESC schemes achieve high classification accuracies for both synthetic and medical sequence datasets, even when the training set contained only a limited number of sequential patterns. The average and worst accuracies of SC-PCR are 95.8% and 80.3%, respectively. The average accuracy of ESC is 96.97%, and the worst accuracy is 87%.

[1]  Michael P. Cummings,et al.  A comparative evaluation of sequence classification programs , 2012, BMC Bioinformatics.

[2]  Anne Laurent,et al.  S2MP: Similarity Measure for Sequential Patterns , 2008, AusDM.

[4]  Suh-Yin Lee,et al.  Fast Discovery of Sequential Patterns through Memory Indexing and Database Partitioning , 2005, J. Inf. Sci. Eng..

[5]  I-En Liao,et al.  Mining Sequential Pattern Changes , 2014, J. Inf. Sci. Eng..

[6]  Salvatore Rampone,et al.  HS3D: Homo Sapiens Splice Site Data Set , 2002 .

[7]  Dimitris G. Papageorgiou,et al.  MERLIN-3.1.1. A new version of the Merlin optimization environment , 2004 .

[8]  Susan Michie,et al.  Classification systems in behavioural science: current systems and lessons from the natural, medical and social sciences , 2012 .

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Vincent S. Tseng,et al.  Effective temporal data classification by integrating sequential pattern mining and probabilistic induction , 2009, Expert Syst. Appl..

[11]  Vincent S. Tseng,et al.  CBS: A New Classification Method by Using Sequential Patterns , 2005, SDM.

[12]  Tom Heskes,et al.  Efficiently learning the preferences of people , 2012, Machine Learning.

[13]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Dimitrios I. Fotiadis,et al.  A two-stage methodology for sequence classification based on sequential pattern mining and optimization , 2008, Data Knowl. Eng..

[15]  Stiliyan Kalitzin,et al.  Automatic Segmentation of Episodes Containing Epileptic Clonic Seizures in Video Sequences , 2012, IEEE Transactions on Biomedical Engineering.

[16]  J. Dalmau,et al.  Limbic Encephalitis and Variants: Classification, Diagnosis and Treatment , 2007, The neurologist.

[17]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[18]  Xiaoyun Chen,et al.  Emerging Patterns and Classification Algorithms for DNA Sequence , 2011, J. Softw..

[19]  I-En Liao,et al.  A Sequence Classification Model Based on Pattern Coverage Rate , 2013, GPC.