A FREQUENT DOCUMENT MINING ALGORITHM WITH CLUSTERING

Now days, finding the association rule from large number of item-set become very popular issue in the field of data mining. To determine the association rule researchers implemented a lot of algorithms and techniques. FPGrowth is a very fast algorithm for finding frequent item-set. This paper, give us a new idea in this field. It replaces the role of frequent item-set to frequent sub graph discovery. It uses the processing of datasets and describes modified FP-algorithm for sub-graph discovery. The document clustering is required for this work. It can use self-similarity function between pair of document graph that similarity can use for clustering with the help of affinity propagation and efficiency of algorithm can be measure by F-measure function.

[1]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[2]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[3]  Maurizio Marchese,et al.  Text Clustering with Seeds Affinity Propagation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[4]  Lawrence B. Holder,et al.  Subdue: compression-based frequent pattern discovery in graph data , 2005 .

[5]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[6]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[8]  M. Shahriar Hossain,et al.  GDClust: A Graph-Based Document Clustering Technique , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[9]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[10]  Monika Akbar,et al.  Frequent pattern-growth approach for document organization , 2008, ONISW '08.

[11]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .