Extracting Diverse Patterns with Unbalanced Concept Hierarchy

The process of frequent pattern extraction finds interesting information about the association among the items in a transactional database. The notion of support is employed to extract the frequent patterns. Normally, in a given domain, a set of items can be grouped into a category and a pattern may contain the items which belong to multiple categories. In several applications, it may be useful to distinguish between the pattern having items belonging to multiple categories and the pattern having items belonging to one or a few categories. The notion of diversity captures the extent the items in the pattern belong to multiple categories. The items and the categories form a concept hierarchy. In the literature, an approach has been proposed to rank the patterns by considering the balanced concept hierarchy. In a real life scenario, the concept hierarchies are normally unbalanced. In this paper, we propose a general approach to calculate the rank based on the diversity, called drank, by considering the unbalanced concept hierarchy. The experiment results show that the patterns ordered based on drank are different from the patterns ordered based on support, and the proposed approach could assign the drank to different kinds of unbalanced patterns.

[1]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[2]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[4]  Hui Xiong,et al.  Discovery of maximum length frequent itemsets , 2008, Inf. Sci..

[5]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[6]  Yifan Chen,et al.  Advertising keyword suggestion based on concept hierarchy , 2008, WSDM '08.

[7]  Tran Minh Quang,et al.  Mining the K-Most Interesting Frequent Patterns Sequentially , 2006, IDEAL.

[8]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[9]  Sami Faïz,et al.  On Mining Summaries by Objective Measures of Interestingness , 2006, Machine Learning.

[10]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[11]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[12]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  Emilio Corchado,et al.  Intelligent Data Engineering and Automated Learning - IDEAL 2006, 7th International Conference, Burgos, Spain, September 20-23, 2006, Proceedings , 2006, IDEAL.

[15]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[16]  A Richard,et al.  DIVERSITY-BASED INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING , 2009 .

[17]  Wynne Hsu,et al.  Finding Interesting Patterns Using User Expectations , 1999, IEEE Trans. Knowl. Data Eng..

[18]  P. Krishna Reddy,et al.  Discovering Diverse-Frequent Patterns in Transactional Databases , 2011, COMAD.

[19]  Jianying Hu,et al.  High-utility pattern mining: A method for discovery of high-utility item sets , 2007, Pattern Recognit..

[20]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Kenneth McGarry,et al.  A survey of interestingness measures for knowledge discovery , 2005, The Knowledge Engineering Review.