Adaptive Cluster Based Discovery of High Utility Itemsets

Utility Itemset Mining (UIM) is a key analysis technique for data which is modeled by the Transactional data model. While improving the computational time and space efficiency of the mining of itemsets is important, it is also critically important to predict future itemsets accurately. In today’s time, when both scientific and business competitive edge is commonly derived from first access to knowledge via advanced predictive ability, this problem becomes increasingly relevant. We established in our most recent work that having prior knowledge of approximate cluster structure of the dataset and using it implicitly in the mining process, can lend itself to accurate prediction of future itemsets. We evaluate the individual strength of each transaction while focusing on itemset prediction, and reshape the transaction utilities based on that. We extend our work by identifying that such reshaping of transaction utilities should be adaptive to the anticipated cluster structure, if there is a specific intended prediction window. We define novel concepts for making such an anticipation and integrate Time Series Forecasting into the evaluation. We perform additional illustrative experiments to demonstrate the application of our improved technique and also discuss future direction for this work.

[1]  Bart Goethals,et al.  A primer to frequent itemset mining for bioinformatics , 2013, Briefings Bioinform..

[2]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[3]  Suh-Yin Lee,et al.  Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[5]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6]  Jesús S. Aguilar-Ruiz,et al.  Gene association analysis: a survey of frequent pattern mining from gene expression data , 2010, Briefings Bioinform..

[7]  Xiaogang Wang,et al.  A roadmap of clustering algorithms: finding a match for a biomedical application , 2008, Briefings Bioinform..

[8]  Li Xiu,et al.  Application of data mining techniques in customer relationship management: A literature review and classification , 2009, Expert Syst. Appl..

[9]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[10]  Shu-Hsien Liao,et al.  Data mining techniques and applications - A decade review from 2000 to 2011 , 2012, Expert Syst. Appl..

[11]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[12]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[13]  Arun K. Somani,et al.  A Clustering based Prediction Scheme for High Utility Itemsets , 2017, KDIR.

[14]  Geert Wets,et al.  Using association rules for product assortment decisions: a case study , 1999, KDD '99.

[15]  Philip S. Yu,et al.  Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets , 2015, IEEE Transactions on Knowledge and Data Engineering.

[16]  Zhang Yi,et al.  SCALE: a scalable framework for efficiently clustering transactional data , 2009, Data Mining and Knowledge Discovery.

[17]  A. Choudhary,et al.  A fast high utility itemsets mining algorithm , 2005, UBDM '05.

[18]  Arun K. Somani,et al.  A Novel Clustering Algorithm to Capture Utility Information in Transactional Data , 2016, KDIR.

[19]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[20]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[21]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[22]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[23]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[24]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..