Exceptional in so Many Ways—Discovering Descriptors That Display Exceptional Behavior on Contrasting Scenarios

The current state of the art in supervised descriptive pattern mining is very good in automatically finding subsets of the dataset at hand that are exceptional in some sense. The most common form, subgroup discovery, generally finds subgroups where a single target variable has an unusual distribution. Exceptional model mining (EMM) typically finds subgroups where a pair of target variables display an unusual interaction. What these methods have in common is that one specific exceptionality is enough to flag up a subgroup as exceptional. This, however, naturally leads to the question: can we also find multiple instances of exceptional behaviour simultaneously in the same subgroup? This paper provides a first, affirmative answer to that question in the form of the SPEC (Subsets of Pairwise Exceptional Correlations) model class for EMM. Given a set of predefined numeric target variables, SPEC will flag up subgroups as interesting if multiple target pairs display an unusual rank correlation. This is a fundamental extension of the EMM toolbox, which comes with additional algorithmic challenges. To address these challenges, we provide a series of algorithmic solutions whose strengths/flaws are empirically analysed.

[1]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[2]  Yujing Luo,et al.  Application of Data Mining Methods in Internet of Things Technology for the Translation Systems in Traditional Ethnic Books , 2020, IEEE Access.

[3]  Jerry Chun-Wei Lin,et al.  Efficient Chain Structure for High-Utility Sequential Pattern Mining , 2020, IEEE Access.

[4]  Sebastián Ventura,et al.  Frequent itemset mining: A 25 years review , 2019, WIREs Data Mining Knowl. Discov..

[5]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[6]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[7]  A. Knobbe,et al.  Supervised descriptive local pattern mining with complex target concepts , 2016 .

[8]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[9]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[10]  Sebastián Ventura,et al.  Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules , 2011, Knowledge and Information Systems.

[11]  Carlos Soares,et al.  Exceptional Preferences Mining , 2016, DS.

[12]  Haviluddin Haviluddin,et al.  Time Complexity Of A Priori And Evolutionary Algorithm For Numerical Association Rule Mining Optimization , 2019 .

[13]  Mykola Pechenizkiy,et al.  Apriori Versions Based on MapReduce for Mining Frequent Patterns on Big Data , 2018, IEEE Transactions on Cybernetics.

[14]  Chengqi Zhang,et al.  Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support , 2009, Expert Syst. Appl..

[15]  Philippe Fournier-Viger,et al.  Extracting User-Centric Knowledge on Two Different Spaces: Concepts and Records , 2020, IEEE Access.

[16]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[17]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[18]  María José del Jesús,et al.  Mining Context-Aware Association Rules Using Grammar-Based Genetic Programming , 2018, IEEE Transactions on Cybernetics.

[19]  Sebastián Ventura,et al.  Recommending degree studies according to students' attitudes in high school by means of subgroup discovery , 2016, Int. J. Comput. Intell. Syst..

[20]  D. Rom A sequentially rejective test procedure based on a modified Bonferroni inequality , 1990 .

[21]  Céline Robardet,et al.  Local Subgroup Discovery for Eliciting and Understanding New Structure-Odor Relationships , 2016, DS.

[22]  Wouter Duivesteijn,et al.  Exceptional Model Mining , 2008, Data Mining and Knowledge Discovery.

[23]  Philippe Fournier-Viger,et al.  A survey of itemset mining , 2017, WIREs Data Mining Knowl. Discov..

[24]  Wouter Duivesteijn,et al.  Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining , 2017, 2015 IEEE International Conference on Data Mining.