Association Rules and Frequent Patterns

Large datasets of transactional records in the form of co-occurring events, variables or features may contain interesting knowledge in terms of implicit relations and patterns. Association Rule Mining (ARM) is the systematic extraction of frequent patterns from data in the form of rules that exposes and explicitly represents the relation between variables. It is a ‘descriptive’ data mining technique, which provides a compact and high level description of interesting patterns found in historical data. ARM has been successfully employed in many application domains such as market basket analysis, Web user behaviour, substructures of online social networks, intrusion detection in communication networks, co-expression of genes in bioinformatics, substructures of molecular compounds in chemoinformatics, etc. The information represented by means of association rules can be used as the basis for decisions, to discover regularities in the data and, in general, to formulate new scientific hypotheses driven by the data. ARM is computationally hard to solve and practical applications require critical design choices in the data analysis workflow, including data pre-processing, the data layout, algorithm selection and tuning of the algorithm’s parameters. Many ARM algorithms have been proposed for more than two decades: Some are suitable for specific data formats, properties or database layouts, others solve a reduced or an extended formulation of the ARM problem for improving efficiency or the interestingness of the result. In this article the ARM problem is introduced and its complexity discussed, the most important algorithms and the extended formulations are briefly described, and some applications are finally provided.

[1]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[2]  Toon Calders,et al.  Non-derivable itemset mining , 2007, Data Mining and Knowledge Discovery.

[3]  Ru Shen,et al.  Mining functional subgraphs from cancer protein-protein interaction networks , 2012, BMC Systems Biology.

[4]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[5]  Jesús S. Aguilar-Ruiz,et al.  Gene association analysis: a survey of frequent pattern mining from gene expression data , 2010, Briefings Bioinform..

[6]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[7]  Ricardo Vilalta,et al.  An efficient approach to external cluster assessment with an application to martian topography , 2007, Data Mining and Knowledge Discovery.

[8]  Moustafa Ghanem,et al.  String Mining in Bioinformatics , 2010, Scientific Data Mining and Knowledge Discovery.

[9]  C. Chevalet,et al.  An algorithm for comparing RNA secondary structures and searching for similar substructures , 1992, Comput. Appl. Biosci..

[10]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[11]  Aidong Zhang,et al.  Predicting Protein Function by Frequent Functional Association Pattern Mining in Protein Interaction Networks , 2010, IEEE Transactions on Information Technology in Biomedicine.

[12]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[13]  Jean-François Boulicaut,et al.  Free-Sets: A Condensed Representation of Boolean Data for the Approximation of Frequency Queries , 2004, Data Mining and Knowledge Discovery.

[14]  Robert Meersman,et al.  On the Complexity of Mining Quantitative Association Rules , 1998, Data Mining and Knowledge Discovery.

[15]  George Karypis,et al.  Frequent Substructure-Based Approaches for Classifying Chemical Compounds , 2005, IEEE Trans. Knowl. Data Eng..

[16]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[17]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[18]  Charu C. Aggarwal,et al.  Frequent Pattern Mining , 2014, Springer International Publishing.

[19]  Fabian Mörchen,et al.  Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression , 2010, Knowledge and Information Systems.

[20]  Shu-Chuan Chen,et al.  Dynamic association rules for gene expression data analysis , 2015, BMC Genomics.

[21]  Irene Papatheodorou,et al.  Using association rule mining to determine promising secondary phenotyping hypotheses , 2014, Bioinform..