A two-phase approach for unexpected pattern mining

Abstract A typical mining task is to retrieve all frequent patterns from a multi-dimensional dataset. Those patterns give us a basic idea of how the data look like and the hidden inherent regularities. However, this is only useful for an unfamiliar dataset, while for datasets that are analyzed periodically, “unexpected” patterns are more interesting (e.g., some customers decided to subscribe to long-term deposits despite the burden of housing loan). In this paper, we propose a new mining job, unexpected mining, which targets at retrieving frequent patterns that are not valid in a reference dataset, but are significant enough in a specific subgroup. Given a reference dataset, we step by step generate all unexpected patterns for all subgroups. We extend existing mining approaches to support the new mining job efficiently. In particular, our scheme consists of an offline process and an online process. Offline process generates candidate patterns and builds an index table. Online process can retrieve unexpected patterns from user-defined subgroups and a given support. Experiments on real datasets show that our approach can find interesting patterns and is very efficient compared to existing approaches.

[1]  Katsiaryna Mirylenka,et al.  Conditional heavy hitters: detecting interesting correlations in data streams , 2015, The VLDB Journal.

[2]  Umeshwar Dayal,et al.  Multi-dimensional sequential pattern mining , 2001, CIKM '01.

[3]  Cong Yu,et al.  Incremental discovery of prominent situational facts , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[4]  Joseph M. Hellerstein,et al.  THE RD-TREE: AN INDEX STRUCTURE FOR SETS , 1997 .

[5]  Sandhya Harikumar,et al.  Efficient attribute selection strategies for association rule mining in high dimensional data , 2017, Int. J. Comput. Sci. Eng..

[6]  WanXin Xu,et al.  A Novel Algorithm of Mining Multidimensional Association Rules , 2006 .

[7]  Fan-Chen Tseng,et al.  An adaptive approach to mining frequent itemsets efficiently , 2012, Expert Syst. Appl..

[8]  Willi Klösgen,et al.  Mining census data for spatial effects on mortality , 2003, Intell. Data Anal..

[9]  Francisco Herrera,et al.  Making CN2-SD subgroup discovery algorithm scalable to large size data sets using instance selection , 2008, Expert Syst. Appl..

[10]  Zhi-Hong Deng,et al.  Fast mining frequent itemsets using Nodesets , 2014, Expert Syst. Appl..

[11]  Parveen Kumar,et al.  FP-tree and COFI Based Approach for Mining of Multiple Level Association Rules in Large Databases , 2010, ICCA 2010.

[12]  Anthony K. H. Tung,et al.  Mining frequent closed cubes in 3D datasets , 2006, VLDB.

[13]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[14]  Yue-Shi Lee,et al.  Mining Multidimensional Frequent Patterns from Relational Database , 2013, ACIIDS.

[15]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[16]  Judit Bar-Ilan,et al.  Search Engine Ability to Cope With the Changing Web , 2004, Web Dynamics.

[17]  Guoliang Chen,et al.  A fast algorithm for mining association rules , 2008, Journal of Computer Science and Technology.

[18]  Stefan Rüping,et al.  On subgroup discovery in numerical domains , 2009, Data Mining and Knowledge Discovery.

[19]  Philip S. Yu,et al.  A Survey of Utility-Oriented Pattern Mining , 2018, IEEE Transactions on Knowledge and Data Engineering.

[20]  Arno Siebes,et al.  Data Surveying: Foundations of an Inductive Query Language , 1995, KDD.

[21]  K. Merz,et al.  Generation of Pairwise Potentials Using Multidimensional Data Mining. , 2018, Journal of chemical theory and computation.

[22]  Hua Zhou,et al.  A Method for Search Engine Retrieval System Based on Hybrid Trie-Inverted File , 2009 .

[23]  Tadeusz Morzy,et al.  Group Bitmap Index: A Structure for Association Rules Retrieval , 1998, KDD.

[24]  Peter A. Flach,et al.  Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned , 2004, Machine Learning.

[25]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[26]  Neelu Khare,et al.  An Algorithm for Mining Multidimensional Association Rules Using Boolean Matrix , 2010, 2010 International Conference on Recent Trends in Information, Telecommunication and Computing.

[27]  Mohammad Teshnehlab,et al.  negFIN: An efficient algorithm for fast mining frequent itemsets , 2018, Expert Syst. Appl..

[28]  Arno J. Knobbe,et al.  Diverse subgroup set discovery , 2012, Data Mining and Knowledge Discovery.

[29]  Katsiaryna Mirylenka,et al.  Finding interesting correlations with conditional heavy hitters , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[30]  Uwe Deppisch,et al.  S-tree: a dynamic balanced signature index for office retrieval , 1986, SIGIR '86.

[31]  Iztok Savnik Efficient Subset and Superset Queries , 2012, DB&Local Proceedings.

[32]  Yannis Manolopoulos,et al.  Hierarchical Bitmap Index: An Efficient and Scalable Indexing Technique for Set-Valued Attributes , 2003, ADBIS.

[33]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[34]  Ching-Chi Hsu,et al.  The Frequent Pattern List: Another Framework for Mining Frequent Patterns , 2005, Int. J. Electron. Bus. Manag..

[35]  K. Prasanna,et al.  CApriori: Conviction based Apriori algorithm for discovering frequent determinant patterns from high dimensional datasets , 2014, 2014 International Conference on Science Engineering and Management Research (ICSEMR).

[36]  Bay Vo,et al.  Mining Frequent Itemsets from Multidimensional Databases , 2011, ACIIDS.

[37]  Yan Liu,et al.  Research and application of association rule mining algorithm based on multidimensional sets , 2014, 2014 IEEE 5th International Conference on Software Engineering and Service Science.

[38]  R. Chithra,et al.  A Novel Algorithm for Mining Hybrid-Dimensional Association Rules , 2010 .

[39]  Pratima Gautam,et al.  An Efficient Algorithm for Mining Multilevel Association Rule Based on Pincer Search , 2012, ArXiv.

[40]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[41]  Stefan Wrobel,et al.  Tight Optimistic Estimates for Fast Subgroup Discovery , 2008, ECML/PKDD.

[42]  Sandhya Harikumar,et al.  Apriori algorithm for association rule mining in high dimensional data , 2016, 2016 International Conference on Data Science and Engineering (ICDSE).

[43]  Frank Puppe,et al.  SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery , 2006, PKDD.

[44]  Branko Kavsek,et al.  APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[45]  Krzysztof Goczyla The partial-order tree: a new structure for indexing on complex attributes in object-oriented databases , 1997, EUROMICRO 97. Proceedings of the 23rd EUROMICRO Conference: New Frontiers of Information Technology (Cat. No.97TB100167).

[46]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[47]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[48]  Rómer Rosales,et al.  Subgroup Discovery for Test Selection: A Novel Approach and Its Application to Breast Cancer Diagnosis , 2009, IDA.

[49]  S. Suresh Raja,et al.  Multidimensional Frequent Pattern Mining Using Association Rule Based Constraints , 2005, ICDCIT.

[50]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[51]  Dan Luo,et al.  A New Improved Apriori Algorithm Based on Compression Matrix , 2014, ADMA.

[52]  Sridhar Ramaswamy,et al.  On the Discovery of Interesting Patterns in Association Rules , 1998, VLDB.