A new approach for mining deep order-preserving submatrices

In this paper, we proposed an exact method to discover all order-preserving submatrices (OPSMs) based on frequent sequential pattern mining. Firstly, an existing algorithm calACS is adjusted to disclose all common subsequences between every two row sequences, therefore all the deep OPSMs corresponding to long patterns with few supporting sequences will not be missed. Then an improved data structure for prefix tree was used to store and traverse all common subsequences, and Apriori principle was employed to mine the frequent sequential pattern efficiently. Finally, experiments were implemented on real data set and GO analysis was applied to identify whether the patterns discovered were biologically significant. The results demonstrate the effectiveness and the efficiency of this method.

[1]  Zhiwei Lin,et al.  A Novel Algorithm for Counting All Common Subsequences , 2007, 2007 IEEE International Conference on Granular Computing (GRC 2007).

[2]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[3]  Zhang Cheng-yi,et al.  On Consistency of Fuzzy Clustering Analysis , 2007, 2007 IEEE International Conference on Granular Computing (GRC 2007).

[4]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[5]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[6]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  Roger E Bumgarner,et al.  Clustering gene-expression data with repeated measurements , 2003, Genome Biology.

[9]  Michael K. Ng,et al.  On Mining Micro-array data by Order-Preserving Submatrix , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[10]  Hui Wang,et al.  All Common Subsequences , 2007, IJCAI.

[11]  Hui Xiong,et al.  On the Deep Order-Preserving Submatrix Problem: A Best Effort Approach , 2012, IEEE Transactions on Knowledge and Data Engineering.

[12]  David Martin,et al.  GOToolBox: functional analysis of gene datasets based on Gene Ontology , 2004, Genome Biology.

[13]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[14]  Zhiwei Lin,et al.  A Novel Algorithm for Counting All Common Subsequences , 2007 .

[15]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[16]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.