FP -GROWTH BASED NEW NORMALIZATION TECHNIQUE FOR SUBGRAPH RANKING

The various problems in large volume of data area have been solved using frequent itemset discovery algorithms. As data mining techniques are being introduced and widely applied to non-traditional itemsets, existing approaches for finding frequent itemsets were out of date as they cannot satisfy the requirement of these domains. Hence, an alternate method of modeling the objects in the said data set, is graph. Modeling objects using graphs allows us to represent an arbitrary relation among entities. The graph is used to model the database objects. Within that model, the problem of finding frequent patterns becomes that of finding subgraphs that occur frequently over the entire set of graphs. In this paper, we present an efficient algorithm for ranking of such frequent subgraphs. This proposed ranking method is applied to the FP-growth method for discovering frequent subgraphs. In order to find out the ranking of subgraphs we present a new normalization technique which is the modified normalization technique applied at each position for a chosen value of Discounted Cumulative Gain (DCG) of a subgraph. Instead of DCG another modified approach called Modified Discounted Cumulative Gain (MDCG) is introduced. The MDCG alone cannot be used to achieve the performance from one query to the next in the search engine’s algorithm. To obtain the new normalization technique an ideal ordering of MDCG (IMDCG) at each position is to be found out. A Modified Discounted Cumulative Gain (MDCG) is calculated using “lift” as a new approach. IMDCG is also evaluated. Then the new approach for finding the normalized values are to be computed. Finally, the values for all rules can be averaged to get an average performance of a ranking algorithm. And also the ordering of obtained values as a result at each position will provide the order of evaluation of rules which in turn gives an efficient ranking of mined subgraphs.

[1]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[2]  Ping Guo,et al.  Frequent mining of subgraph structures , 2006, J. Exp. Theor. Artif. Intell..

[3]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[5]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[6]  Ehud Gudes,et al.  Diagonally Subgraphs Pattern Mining , 2004, DMKD '04.

[7]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[8]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  A. John MINING GRAPH DATA , 2022 .

[10]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[11]  Joost N. Kok,et al.  The Gaston Tool for Frequent Subgraph Mining , 2005, GraBaTs.

[12]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[13]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[14]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[15]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[16]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[17]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[18]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[19]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[20]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..