Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining

Exceptional Model Mining strives to find coherent subgroups of the dataset where multiple target attributes interact in an unusual way. One instance of such an investigated form of interaction is Pearson's correlation coefficient between two targets. EMM then finds subgroups with an exceptionally linear relation between the targets. In this paper, we enrich the EMM toolbox by developing the more general rank correlation model class. We find subgroups with an exceptionally monotone relation between the targets. Apart from catering for this richer set of relations, the rank correlation model class does not necessarily require the assumption of target normality, which is implicitly invoked in the Pearson's correlation model class. Furthermore, it is less sensitive to outliers.

[1]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[2]  Stephen E. Robertson,et al.  A new rank correlation coefficient for information retrieval , 2008, SIGIR '08.

[3]  David J. Hand,et al.  Pattern Detection and Discovery , 2002, Pattern Detection and Discovery.

[4]  Arno Knobbe,et al.  Exceptional Model Mining , 2008, ECML/PKDD.

[5]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[6]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[7]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[8]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[9]  Bernhard Schölkopf,et al.  The Randomized Dependence Coefficient , 2013, NIPS.

[10]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[11]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[12]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[13]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[14]  A. J. Feelders,et al.  Subgroup Discovery Meets Bayesian Networks -- An Exceptional Model Mining Approach , 2010, 2010 IEEE International Conference on Data Mining.

[15]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[16]  Mario Boley,et al.  Instant Exceptional Model Mining Using Weighted Controlled Pattern Sampling , 2014, IDA.

[17]  H. Gebelein Das statistische Problem der Korrelation als Variations‐ und Eigenwertproblem und sein Zusammenhang mit der Ausgleichsrechnung , 1941 .

[18]  Wouter Duivesteijn,et al.  Understanding Where Your Classifier Does (Not) Work -- The SCaPE Model Class for EMM , 2014, 2014 IEEE International Conference on Data Mining.

[19]  I. Guyon,et al.  Learning to discover: the Higgs boson machine learning challenge , 2014 .

[20]  C. Kowalski On the Effects of Non‐Normality on the Distribution of the Sample Product‐Moment Correlation Coefficient , 1972 .

[21]  Klemens Böhm,et al.  Multivariate Maximal Correlation Analysis , 2014, ICML.

[22]  Johannes Fürnkranz,et al.  Multi-label LeGo - Enhancing Multi-label Classifiers with Local Patterns , 2012, IDA.

[23]  W. Hoeffding A Non-Parametric Test of Independence , 1948 .

[24]  J. Kinney,et al.  Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[25]  E. S. Pearson,et al.  TESTS FOR RANK CORRELATION COEFFICIENTS. I , 1957 .

[26]  Eyke Hüllermeier,et al.  Clustering of gene expression data using a local shape-based similarity measure , 2005, Bioinform..

[27]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[28]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[29]  Sik-Yum Lee,et al.  Application of rank correlation to lanthanide induced shift data , 1980 .

[30]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[31]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[32]  Pauli Miettinen,et al.  From black and white to full color: extending redescription mining outside the Boolean world , 2012, Stat. Anal. Data Min..

[33]  P. Anglin,et al.  SEMIPARAMETRIC ESTIMATION OF A HEDONIC PRICE FUNCTION , 1996 .

[34]  Wouter Duivesteijn,et al.  Exceptional Model Mining , 2008, Data Mining and Knowledge Discovery.

[35]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[36]  Klemens Böhm,et al.  4S: Scalable subspace search scheme overcoming traditional Apriori processing , 2013, 2013 IEEE International Conference on Big Data.

[37]  Florian Lemmerich,et al.  Generic Pattern Trees for Exhaustive Exceptional Model Mining , 2012, ECML/PKDD.

[38]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[39]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[40]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[41]  A. J. Feelders,et al.  Different slopes for different folks: mining for exceptional regression models with cook's distance , 2012, KDD.

[42]  J. Kiefer,et al.  DISTRIBUTION FREE TESTS OF INDEPENDENCE BASED ON THE SAMPLE DISTRIBUTION FUNCTION , 1961 .

[43]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[44]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[45]  Alípio Mário Jorge,et al.  Distribution Rules with Numeric Attributes of Interest , 2006, PKDD.

[46]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[47]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[48]  Blaz Zupan,et al.  Subgroup discovery in data sets with multi-dimensional responses , 2011, Intell. Data Anal..

[49]  F. J. Anscombe,et al.  Graphs in Statistical Analysis , 1973 .