论文信息 - A FREQUENT DOCUMENT MINING ALGORITHM WITH CLUSTERING

A FREQUENT DOCUMENT MINING ALGORITHM WITH CLUSTERING

Now days, finding the association rule from large number of item-set become very popular issue in the field of data mining. To determine the association rule researchers implemented a lot of algorithms and techniques. FPGrowth is a very fast algorithm for finding frequent item-set. This paper, give us a new idea in this field. It replaces the role of frequent item-set to frequent sub graph discovery. It uses the processing of datasets and describes modified FP-algorithm for sub-graph discovery. The document clustering is required for this work. It can use self-similarity function between pair of document graph that similarity can use for clustering with the help of affinity propagation and efficiency of algorithm can be measure by F-measure function.

[1] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[2] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[3] Maurizio Marchese,et al. Text Clustering with Seeds Affinity Propagation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[4] Lawrence B. Holder,et al. Subdue: compression-based frequent pattern discovery in graph data , 2005 .

[5] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[6] George Karypis,et al. An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[8] M. Shahriar Hossain,et al. GDClust: A Graph-Based Document Clustering Technique , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[9] Jian Pei,et al. Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[10] Monika Akbar,et al. Frequent pattern-growth approach for document organization , 2008, ONISW '08.

[11] Jiawei Han,et al. gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12] Anna-Lan Huang,et al. Similarity Measures for Text Document Clustering , 2008 .