Detecting Interesting Differences: Data Mining in Health Insurance Data using Outlier Detection and Subgroup Discovery

[1]  Henrik Grosskreutz,et al.  Non-redundant Subgroup Discovery Using a Closure System , 2009, ECML/PKDD.

[2]  Arno J. Knobbe,et al.  Diverse subgroup set discovery , 2012, Data Mining and Knowledge Discovery.

[3]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[4]  Frank Puppe,et al.  Local Models for Expectation-Driven Subgroup Discovery , 2011, 2011 IEEE 11th International Conference on Data Mining.

[5]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[6]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[7]  Malcolm K. Sparrow,et al.  License to Steal: Why Fraud Plagues America's HealthCare System , 1997 .

[8]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[9]  Peter A. Flach,et al.  Rule Evaluation Measures: A Unifying View , 1999, ILP.

[10]  Kwang-Ho Ro,et al.  Outlier detection for high-dimensional data , 2015 .

[11]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[12]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[13]  Francisco Herrera,et al.  Subgroup discover in large size data sets preprocessed using stratified instance selection for increasing the presence of minority classes , 2008, Pattern Recognit. Lett..

[14]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[15]  Branko Kavsek,et al.  APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[16]  Francisco Herrera,et al.  Making CN2-SD subgroup discovery algorithm scalable to large size data sets using instance selection , 2008, Expert Syst. Appl..

[17]  P. Rousseeuw,et al.  The Bagplot: A Bivariate Boxplot , 1999 .

[18]  Frank Puppe,et al.  SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery , 2006, PKDD.

[19]  Schloss Birlinghoven,et al.  Cascaded Subgroups Discovery with an Application to Regression , 2008 .

[20]  Willi Klösgen Applications and Research Problems of Subgroup Mining , 1999, ISMIS.

[21]  Stefan Wrobel,et al.  Tight Optimistic Estimates for Fast Subgroup Discovery , 2008, ECML/PKDD.

[22]  Frank Puppe,et al.  Fast exhaustive subgroup discovery with numerical target concepts , 2016, Data Mining and Knowledge Discovery.

[23]  Marvin Meeng,et al.  Cost-based quality measures in subgroup discovery , 2014, Journal of Intelligent Information Systems.

[24]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.

[25]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[26]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[27]  Willi Klösgen,et al.  Census Data Mining – An Application , 2002 .

[28]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[29]  Christos Faloutsos,et al.  Cross-Outlier Detection , 2003, SSTD.

[30]  Christos Faloutsos,et al.  OBE: Outlier by Example , 2004, PAKDD.

[31]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[32]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[33]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[34]  Stefan Rüping,et al.  On subgroup discovery in numerical domains , 2009, Data Mining and Knowledge Discovery.

[35]  Andrew W. Moore,et al.  Bayesian Network Anomaly Pattern Detection for Disease Outbreaks , 2003, ICML.

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  R. Cook Influential Observations in Linear Regression , 1979 .

[38]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[39]  Florian Lemmerich,et al.  Fast Subgroup Discovery for Continuous Target Concepts , 2009, ISMIS.

[40]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[41]  Henrik Grosskreutz,et al.  Subgroup Discovery for Election Analysis: A Case Study in Descriptive Data Mining , 2010, Discovery Science.

[42]  Siegfried Nijssen,et al.  Efficient Algorithms for Finding Richer Subgroup Descriptions in Numeric and Nominal Data , 2012, 2012 IEEE 12th International Conference on Data Mining.

[43]  Alípio Mário Jorge,et al.  Distribution Rules with Numeric Attributes of Interest , 2006, PKDD.

[44]  Wouter Duivesteijn,et al.  Exploiting False Discoveries -- Statistical Validation of Patterns and Quality Measures in Subgroup Discovery , 2011, 2011 IEEE 11th International Conference on Data Mining.

[45]  Anil K. Ghosh,et al.  On Maximum Depth and Related Classifiers , 2005 .

[46]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[47]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[48]  Peter A. Flach,et al.  RSD: Relational Subgroup Discovery through First-Order Feature Construction , 2002, ILP.

[49]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[50]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[51]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[52]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[53]  Joost N. Kok,et al.  Multi-class Correlated Pattern Mining , 2005, KDID.

[54]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[55]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[56]  Witold Pedrycz,et al.  Handbook of Data Mining and Knowledge Discovery , 2002 .

[57]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[58]  H. Cramér Mathematical methods of statistics , 1947 .

[59]  Arno J. Knobbe,et al.  Pattern Teams , 2006, PKDD.

[60]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[61]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[62]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[63]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[64]  Wouter Duivesteijn,et al.  Discovering Local Subgroups, with an Application to Fraud Detection , 2013, PAKDD.

[65]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[66]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[67]  Richard Frank,et al.  Exploring the structural characteristics of social networks in a large criminal court database , 2013, 2013 IEEE International Conference on Intelligence and Security Informatics.

[68]  P. Rousseeuw,et al.  Computing depth contours of bivariate point clouds , 1996 .

[69]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[70]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[71]  Barbara F. I. Pieters,et al.  Subgroup Discovery in Ranked Data, with an Application to Gene Set Enrichment , 2010 .

[72]  C.J.H. Mann,et al.  Handbook of Data Mining and Knowledge Discovery , 2004 .

[73]  Frank Puppe,et al.  Semi-Automatic Visual Subgroup Mining using VIKAMINE , 2005, J. Univers. Comput. Sci..

[74]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[75]  Franco Turini,et al.  k-NN as an implementation of situation testing for discrimination discovery and prevention , 2011, KDD.

[76]  Donald E. Ramirez,et al.  Bringing Order to Outlier Diagnostics in Regression Models , 2001 .

[77]  A. Choudhary,et al.  A fast high utility itemsets mining algorithm , 2005, UBDM '05.

[78]  Trevor Hastie,et al.  The Elements of Statistical Learning Theory , 2001 .

[79]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[80]  Mohammad Zulkernine,et al.  Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection , 2006, 2006 IEEE International Conference on Communications.

[81]  D. Donoho,et al.  Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , 1992 .

[82]  Kenji Yamanishi,et al.  Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner , 2001, KDD '01.

[83]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[84]  Arno Knobbe,et al.  Exceptional Model Mining , 2008, ECML/PKDD.

[85]  A. Knobbe,et al.  Flexible Enrichment with Cortana – Software Demo , 2011 .

[86]  L. J. Bain,et al.  Introduction to Probability and Mathematical Statistics , 1987 .

[87]  Jeff G. Schneider,et al.  Detecting anomalous records in categorical datasets , 2007, KDD '07.

[88]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[89]  Wojtek Kowalczyk,et al.  Finding Fraud in Health Insurance Data with Two-Layer Outlier Detection Approach , 2011, DaWaK.

[90]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[91]  Siegfried Nijssen,et al.  Efficient algorithms for finding optimal binary features in numeric and nominal labeled data , 2015, Knowledge and Information Systems.

[92]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[93]  Hector N. Qirko Collapse: How Societies Choose to Fail or Succeed , 2005 .

[94]  Arno J. Knobbe,et al.  Non-redundant Subgroup Discovery in Large and Complex Data , 2011, ECML/PKDD.

[95]  Geoffrey I. Webb,et al.  Advances in Knowledge Discovery and Data Mining , 2018, Lecture Notes in Computer Science.

[96]  Wojtek Kowalczyk,et al.  An Interactive Approach to Outlier Detection , 2010, RSKT.

[97]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[98]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[99]  Wojtek Kowalczyk,et al.  Hunting for Fraudsters in Random Forests , 2012, HAIS.

[100]  Peter A. Flach,et al.  Technical Note: Towards ROC Curves in Cost Space , 2011, ArXiv.

[101]  Luc De Raedt,et al.  Using Logical Decision Trees for Clustering , 1997, ILP.

[102]  F. J. Anscombe,et al.  Rejection of Outliers , 1960 .

[103]  Geoffrey I. Webb Discovering Significant Patterns , 2007, Machine Learning.

[104]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[105]  Christian Böhm,et al.  CoCo: coding cost for parameter-free outlier detection , 2009, KDD.

[106]  M. Kulldorff A spatial scan statistic , 1997 .