Customer Activity Sequence Classification for Debt Prevention in Social Security

From a data mining perspective, sequence classification is to build a classifier using frequent sequential patterns. However, mining for a complete set of sequential patterns on a large dataset can be extremely time-consuming and the large number of patterns discovered also makes the pattern selection and classifier building very time-consuming. The fact is that, in sequence classification, it is much more important to discover discriminative patterns than a complete pattern set. In this paper, we propose a novel hierarchical algorithm to build sequential classifiers using discriminative sequential patterns. Firstly, we mine for the sequential patterns which are the most strongly correlated to each target class. In this step, an aggressive strategy is employed to select a small set of sequential patterns. Secondly, pattern pruning and serial coverage test are done on the mined patterns. The patterns that pass the serial test are used to build the sub-classifier at the first level of the final classifier. And thirdly, the training samples that cannot be covered are fed back to the sequential pattern mining stage with updated parameters. This process continues until predefined interestingness measure thresholds are reached, or all samples are covered. The patterns generated in each loop form the sub-classifier at each level of the final classifier. Within this framework, the searching space can be reduced dramatically while a good classification performance is achieved. The proposed algorithm is tested in a real-world business application for debt prevention in social security area. The novel sequence classification algorithm shows the effectiveness and efficiency for predicting debt occurrences based on customer activity sequence data.

[1]  Frank Klawonn,et al.  Sequence Mining for Customer Behaviour Predictions in Telecommunications , 2006 .

[2]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[3]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[4]  Mohammed J. Zaki,et al.  Mining features for sequence classification , 1999, KDD '99.

[5]  Gunnar Rätsch,et al.  Learning Interpretable SVMs for Biological Sequence Classification , 2005, BMC Bioinformatics.

[6]  Cathy H. Wu,et al.  Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition , 1995, Machine Learning.

[7]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[8]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[9]  Vincent S. Tseng,et al.  CBS: A New Classification Method by Using Sequential Patterns , 2005, SDM.

[10]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Ke Wang,et al.  Frequent-subsequence-based prediction of outer membrane proteins , 2003, KDD '03.

[12]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[13]  Biing-Hwang Juang,et al.  Minimum classification error rate methods for speech recognition , 1997, IEEE Trans. Speech Audio Process..

[14]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[15]  Yaser Sheikh,et al.  CASEE: A Hierarchical Event Representation for the Analysis of Videos , 2004, AAAI.

[16]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Dimitrios I. Fotiadis,et al.  A two-stage methodology for sequence classification based on sequential pattern mining and optimization , 2008, Data Knowl. Eng..

[18]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[19]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[20]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[21]  Sanjay Chawla,et al.  Using Significant, Positively Associated and Relatively Class Correlated Rules for Associative Classification of Imbalanced Datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[22]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Osmar R. Zaïane,et al.  Learning to Use a Learned Model: A Two-Stage Approach to Classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[24]  Philip S. Yu,et al.  Mining Sequence Classifiers for Early Prediction , 2008, SDM.

[25]  Gunnar Rätsch,et al.  Learning Interpretable SVMs for Biological Sequence Classification , 2006, BMC Bioinformatics.

[26]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[27]  Antonia J. Jones,et al.  Feature selection for genetic sequence classification , 1998, Bioinform..

[28]  Elena Baralis,et al.  A lazy approach to pruning classification rules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[29]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.