An Efficient Approach for Discovering Closed Frequent Patterns in High Dimensional Data Sets

The growth in the new technology in the field of e-commerce and bioinformatics has resulted in production of large data sets with few new uniqueness. Microarray datasets consist of a very large number of features (nearly thousands of features) but very less number of rows because of its application type. ARM can be used to analyze such data and find the characteristics hidden in these data. However, most state-of-the-art ARM methods are not able to tackle a datasets containing large number of attributes effectively. In this paper, we have proposed and implemented a modified Carpenter algorithm with different consideration of data structure, which in result give us the better time complexity in compare to simple implementation of Carpenter.

[1]  Hongyan Liu,et al.  Mining Interesting Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach , 2006, SDM.

[2]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[3]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[4]  Anthony K. H. Tung,et al.  COBBLER: combining column and row enumeration for closed pattern discovery , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[7]  Arthur Zimek,et al.  A survey on enhanced subspace clustering , 2013, Data Mining and Knowledge Discovery.

[8]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[9]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[10]  George Hripcsak,et al.  Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[11]  Dean F. Sittig,et al.  Validation of an Association Rule Mining-Based Method to Infer Associations Between Medications and Problems , 2013, Applied Clinical Informatics.

[12]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[13]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.