Interesting association rule mining with consistent and inconsistent rule detection from big sales data in distributed environment

Abstract Nowadays, there is an increasing demand in mining interesting patterns from the big data. The process of analyzing such a huge amount of data is really computationally complex task when using traditional methods. The overall purpose of this paper is in twofold. First, this paper presents a novel approach to identify consistent and inconsistent association rules from sales data located in distributed environment. Secondly, the paper also overcomes the main memory bottleneck and computing time overhead of single computing system by applying computations to multi node cluster. The proposed method initially extracts frequent itemsets for each zone using existing distributed frequent pattern mining algorithms. The paper also compares the time efficiency of Mapreduce based frequent pattern mining algorithm with Count Distribution Algorithm (CDA) and Fast Distributed Mining (FDM) algorithms. The association generated from frequent itemsets are too large that it becomes complex to analyze it. Thus, Mapreduce based consistent and inconsistent rule detection (MR-CIRD) algorithm is proposed to detect the consistent and inconsistent rules from big data and provide useful and actionable knowledge to the domain experts. These pruned interesting rules also give useful knowledge for better marketing strategy as well. The extracted consistent and inconsistent rules are evaluated and compared based on different interestingness measures presented together with experimental results that lead to the final conclusions.

[1]  Byeong-Soo Jeong,et al.  An Efficient Distributed Programming Model for Mining Useful Patterns in Big Datasets , 2013 .

[2]  Bharat Bhasker,et al.  Metamorphosis: Mining Maximal Frequent Sets in Dense Domains , 2005, Int. J. Artif. Intell. Tools.

[3]  H. Altay Güvenir,et al.  Modeling interestingness of streaming association rules as a benefit-maximizing classification problem , 2009, Knowl. Based Syst..

[4]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Eli Upfal,et al.  PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce , 2012, CIKM.

[6]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[7]  Bhabesh Nath,et al.  Multi-objective rule mining using genetic algorithms , 2004, Inf. Sci..

[8]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[9]  Olivier Teytaud,et al.  Association Rule Interestingness: Measure and Statistical Validation , 2007, Quality Measures in Data Mining.

[10]  C. Deisy,et al.  Outliers Detection on Educational Data using Fuzzy Association Rule Mining , 2014 .

[11]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[12]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[13]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[14]  Hubert Kadima,et al.  Searching Frequent Itemsets by Clustering Data: Towards a Parallel Approach Using Mapreduce , 2011, WISE Workshops.

[15]  Geeta Sikka,et al.  Association Rules Extraction using Multi-objective Feature of Genetic Algorithm , 2013 .

[16]  Jimmy J. Lin,et al.  Scaling big data mining infrastructure: the twitter experience , 2013, SKDD.

[17]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[18]  Frank S. C. Tseng,et al.  Parallel Association Rule Mining by Data De-Clustering to Support Grid Computing , 2005, PACIS.

[19]  Ji Zhang,et al.  Outlier detection from large distributed databases , 2013, World Wide Web.

[20]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[21]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[22]  Mitica Craus,et al.  An improved version of the frequent itemset mining algorithm , 2015, 2015 14th RoEduNet International Conference - Networking in Education and Research (RoEduNet NER).

[23]  Geert Wets,et al.  Defining interestingness for association rules , 2003 .

[24]  V. Radha,et al.  Enhanced Outlier Detection Method Using Association Rule Mining Technique , 2012 .

[25]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[26]  Hiroyuki Kitagawa,et al.  Outlier Detection for Transaction Databases Using Association Rules , 2008, 2008 The Ninth International Conference on Web-Age Information Management.

[27]  Wen-Yang Lin,et al.  Automated support specification for efficient mining of interesting association rules , 2006, J. Inf. Sci..

[28]  A. Bakar,et al.  Incorporating Negative Association Rules to discover meaningful Outlier from Non _ Reduct Computation : A Medical Predicitve Analysis , 2012 .

[29]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[30]  Cornelia Gy,et al.  A COMPARATIVE STUDY OF DISTRIBUTED ALGORITHMS IN MINING ASSOCIATION RULES , 2003 .

[31]  Runhe Huang,et al.  A study on association rule mining of darknet big data , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).