An overview on subgroup discovery: foundations and applications

Subgroup discovery is a data mining technique which extracts interesting rules with respect to a target variable. An important characteristic of this task is the combination of predictive and descriptive induction. An overview related to the task of subgroup discovery is presented. This review focuses on the foundations, algorithms, and advanced studies together with the applications of subgroup discovery presented throughout the specialised bibliography.

[1]  Franz Schweiggert,et al.  Rule cubes for causal investigations , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[2]  María José del Jesús,et al.  NMEEF-SD: Non-dominated Multiobjective Evolutionary Algorithm for Extracting Fuzzy Rules in Subgroup Discovery , 2010, IEEE Transactions on Fuzzy Systems.

[3]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[4]  Alexandr Savinov,et al.  Exploratory Analysis of Spatial Data Using Interactive Maps and Data Mining , 2001 .

[5]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[6]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[7]  Rómer Rosales,et al.  Subgroup Discovery for Test Selection: A Novel Approach and Its Application to Breast Cancer Diagnosis , 2009, IDA.

[8]  George E. P. Box,et al.  Time Series Analysis: Box/Time Series Analysis , 2008 .

[9]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[10]  Nada Lavrac,et al.  Supporting Factors in Descriptive Analysis of Brain Ischaemia , 2007, AIME.

[11]  Francisco Herrera,et al.  Genetic fuzzy systems: taxonomy, current research trends and prospects , 2008, Evol. Intell..

[12]  Nada Lavrac,et al.  Relational Subgroup Discovery for Descriptive Analysis of Microarray Data , 2006, CompLife.

[13]  Nada Lavrac,et al.  Induction of comprehensible models for gene expression datasets by subgroup discovery methodology , 2004, J. Biomed. Informatics.

[14]  Nada Lavrac,et al.  Clinical data analysis based on iterative subgroup discovery: experiments in brain ischaemia data analysis , 2007, Applied Intelligence.

[15]  Nada Lavrac,et al.  Avoiding Data Overfitting in Scientific Discovery: Experiments in Functional Genomics , 2004, ECAI.

[16]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[17]  F. Železný,et al.  RELATIONAL SUBGROUP DISCOVERY FOR GENE EXPRESSION DATA MINING , 2005 .

[18]  Florian Lemmerich,et al.  Fast Subgroup Discovery for Continuous Target Concepts , 2009, ISMIS.

[19]  Dragan Gamberger,et al.  Subgroup Discovery: On-line Data Mining Server And Its Application , 2003 .

[20]  L. A. ZADEH,et al.  The concept of a linguistic variable and its application to approximate reasoning - I , 1975, Inf. Sci..

[21]  Nada Lavrac Subgroup Discovery Techniques and Applications , 2005, PAKDD.

[22]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[23]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[24]  Saso Dzeroski,et al.  Local Patterns: Theory and Practice of Constraint-Based Relational Subgroup Discovery , 2004, Local Pattern Detection.

[25]  C J Carmona,et al.  Evolutionary algorithms for subgroup discovery applied to e-learning data , 2010, IEEE EDUCON 2010 Conference.

[26]  Frank Puppe,et al.  Towards Knowledge-Intensive Subgroup Discovery , 2004, LWA.

[27]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[28]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[29]  Branko Kavÿsek,et al.  Using Subgroup Discovery to Analyze the UK Traffic Data , 2004 .

[30]  María José del Jesús,et al.  Non-dominated Multi-objective Evolutionary Algorithm Based on Fuzzy Rules Extraction for Subgroup Discovery , 2009, HAIS.

[31]  María José del Jesús,et al.  Multiobjective Genetic Algorithm for Extracting Subgroup Discovery Fuzzy Rules , 2007, 2007 IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making.

[32]  Dragan Gamberger,et al.  Temporal Analysis of Political Instability through Descriptive Subgroup Discovery , 2008 .

[33]  Annie Morin,et al.  Subgroup Discovery in Data Sets with Multi-dimensional Responses: A Method and a Case Study in Traumatology , 2009, AIME.

[34]  Zbigniew Michalewicz,et al.  Handbook of Evolutionary Computation , 1997 .

[35]  María José del Jesús,et al.  An analysis of evolutionary algorithms with different types of fuzzy rules in subgroup discovery , 2009, 2009 IEEE International Conference on Fuzzy Systems.

[36]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[37]  D. Wettschereck,et al.  Subgroup Visualization: A Method and Application in Population Screening , 2002 .

[38]  Francisco Herrera,et al.  Genetic Fuzzy Systems - Evolutionary Tuning and Learning of Fuzzy Knowledge Bases , 2002, Advances in Fuzzy Systems - Applications and Theory.

[39]  Lemonia Ragia,et al.  Spatial Subgroup Discovery Applied to the Analysis of Vegetation Data , 2002, PAKM.

[40]  Peter A. Flach,et al.  Rule induction for subgroup discovery with CN2-SD , 2002 .

[41]  Willi Klösgen,et al.  Census Data Mining – An Application , 2002 .

[42]  Francisco Herrera,et al.  Making CN2-SD subgroup discovery algorithm scalable to large size data sets using instance selection , 2008, Expert Syst. Appl..

[43]  Albrecht Zimmermann,et al.  One in a million: picking the right patterns , 2008, Knowledge and Information Systems.

[44]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[45]  Francisco Herrera,et al.  Fuzzy Sets and Their Extensions: Representation, Aggregation and Models , 2008 .

[46]  N. Lavra,et al.  EXPERIMENTAL COMPARISON OF THREE SUBGROUP DISCOVERY ALGORITHMS: ANALYSING BRAIN ISCHAEMIA DATA , 2005 .

[47]  Francisco Herrera,et al.  Subgroup discover in large size data sets preprocessed using stratified instance selection for increasing the presence of minority classes , 2008, Pattern Recognit. Lett..

[48]  Frank Puppe,et al.  Semi-Automatic Visual Subgroup Mining using VIKAMINE , 2005, J. Univers. Comput. Sci..

[49]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[50]  Nada Lavrac,et al.  Relational Descriptive Analysis of Gene Expression Data , 2006, STAIRS.

[51]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[52]  Alex A. Freitas,et al.  Discovering interesting prediction rules with a genetic algorithm , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[53]  C.J.H. Mann,et al.  Handbook of Data Mining and Knowledge Discovery , 2004 .

[54]  Martin Scholz,et al.  Knowledge-Based Sampling for Subgroup Discovery , 2004, Local Pattern Detection.

[55]  María José del Jesús,et al.  Evolutionary Fuzzy Rule Induction Process for Subgroup Discovery: A Case Study in Marketing , 2007, IEEE Transactions on Fuzzy Systems.

[56]  Frank Puppe,et al.  A Knowledge-Intensive Approach for Semi-automatic Causal Subgroup Discovery , 2009, Knowledge Discovery Enhanced with Semantic and Social Information.

[57]  J. Periaux,et al.  Evolutionary Methods for Design, Optimization and Control with Applications to Industrial Problems , 2001 .

[58]  Nada Lavrac,et al.  Classification Rule Learning with APRIORI-C , 2001, EPIA.

[59]  María José del Jesús,et al.  Evolutionary fuzzy rule extraction for subgroup discovery in a psychiatric emergency department , 2011, Soft Comput..

[60]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[61]  Stefan Wrobel,et al.  Inductive Logic Programming for Knowledge Discovery in Databases , 2001 .

[62]  Nada Lavrac,et al.  Learning Relational Descriptions of Differentially Expressed Gene Groups , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[63]  Nada Lavrac,et al.  Propositionalization-based relational subgroup discovery with RSD , 2006, Machine Learning.

[64]  Didier Dubois,et al.  On the representation, measurement, and discovery of fuzzy associations , 2005, IEEE Transactions on Fuzzy Systems.

[65]  Jana Schmidt,et al.  Interpreting PET Scans by Structured Patient Data: A Data Mining Case Study in Dementia Research , 2008, ICDM.

[66]  Frank Puppe,et al.  Introspective Subgroup Analysis for Interactive Knowledge Refinement , 2006, FLAIRS Conference.

[67]  Stefan Wrobel,et al.  Tight Optimistic Estimates for Fast Subgroup Discovery , 2008, ECML/PKDD.

[68]  Dragan Gamberger,et al.  Subgroup evaluation and decision support for a direct mailing marketing problem , 2001 .

[69]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[70]  Nada Lavrač,et al.  Analysis of Example Weighting in Subgroup Discovery by Comparison of Three Algorithms on a Real-life Data Set , 2004 .

[71]  Frank Puppe,et al.  SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery , 2006, PKDD.

[72]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[73]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[74]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[75]  Klaus Truemper,et al.  Discretization of Target Attributes for Subgroup Discovery , 2009, MLDM.

[76]  Nada Lavrac,et al.  Active subgroup mining: a case study in coronary heart disease risk group detection , 2003, Artif. Intell. Medicine.

[77]  Stefan Rüping,et al.  On subgroup discovery in numerical domains , 2009, Data Mining and Knowledge Discovery.

[78]  Stefan Wrobel,et al.  Finding the Most Interesting Patterns in a Database Quickly by Using Sequential Sampling , 2003, J. Mach. Learn. Res..

[79]  H. Ishibuchi Genetic fuzzy systems: evolutionary tuning and learning of fuzzy knowledge bases , 2004 .

[80]  Arno Siebes,et al.  Data Surveying: Foundations of an Inductive Query Language , 1995, KDD.

[81]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[82]  Jan M. Zytkow,et al.  Handbook of Data Mining and Knowledge Discovery , 2002 .

[83]  Stefan Rüping,et al.  Ranking interesting subgroups , 2009, ICML '09.

[84]  Alípio Mário Jorge,et al.  A Tool for Interactive Subgroup Discovery Using Distribution Rules , 2007, EPIA Workshops.

[85]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[86]  Osamu Watanabe,et al.  Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms , 1999, Discovery Science.

[87]  Martin Atzmüller,et al.  Using Declarative Specifications of Domain Knowledge for Descriptive Data Mining , 2007, INAP/WLP.

[88]  Willi Klösgen,et al.  Mining census data for spatial effects on mortality , 2003, Intell. Data Anal..

[89]  Peter A. Flach,et al.  Rule Evaluation Measures: A Unifying View , 1999, ILP.

[90]  Nada Lavrac,et al.  Generating Actionable Knowledge by Expert-Guided Subgroup Discovery , 2002, PKDD.

[91]  Marco Laumanns,et al.  SPEA2: Improving the Strength Pareto Evolutionary Algorithm For Multiobjective Optimization , 2002 .

[92]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[93]  Lotfi A. Zadeh,et al.  The Concepts of a Linguistic Variable and its Application to Approximate Reasoning , 1975 .

[94]  Beatriz López,et al.  Voltage Sag Source Location From Extracted Rules Using Subgroup Discovery , 2008, CCIA.

[95]  Peter A. Flach,et al.  Decision Support Through Subgroup Discovery: Three Case Studies and the Lessons Learned , 2004, Machine Learning.

[96]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[97]  Alípio Mário Jorge,et al.  Visual Interactive Subgroup Discovery with Numerical Properties of Interest , 2006, Discovery Science.

[98]  Frank Puppe,et al.  A case-based approach for characterization and analysis of subgroup patterns , 2008, Applied Intelligence.

[99]  Jan M. Zytkow,et al.  From Contingency Tables to Various Forms of Knowledge in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[100]  Peter A. Flach,et al.  RSD: Relational Subgroup Discovery through First-Order Feature Construction , 2002, ILP.

[101]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[102]  Filip Železný,et al.  Using constraints in relational subgroup discovery , 2003 .

[103]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[104]  María José del Jesús,et al.  Multiobjective Evolutionary Induction of Subgroup Discovery Fuzzy Rules: A Case Study in Marketing , 2006, ICDM.

[105]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[106]  Frank Puppe,et al.  A Semi-Automatic Approach for Confounding-Aware Subgroup Discovery , 2009, Int. J. Artif. Intell. Tools.

[107]  Willi Klösgen,et al.  Spatial Subgroup Mining Integrated in an Object-Relational Spatial Database , 2002, PKDD.

[108]  Branko Kavsek,et al.  APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[109]  Henrik Grosskreutz,et al.  Non-redundant Subgroup Discovery Using a Closure System , 2009, ECML/PKDD.

[110]  Branko Kavsek,et al.  ROC Analysis of Example Weighting in Subgroup Discovery , 2004, ROCAI.

[111]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[112]  Nada Lavrac,et al.  CSM-SD: Methodology for contrast set mining through subgroup discovery , 2009, J. Biomed. Informatics.

[113]  Willi Klösgen Applications and Research Problems of Subgroup Mining , 1999, ISMIS.

[114]  María José del Jesús,et al.  Evolutionary algorithms for subgroup discovery in e-learning: A practical application using Moodle data , 2009, Expert Syst. Appl..

[115]  Matthew Richardson,et al.  Learning with Knowledge from Multiple Experts , 2003, ICML.

[116]  Dunja Mladenic,et al.  Knowledge Discovery Enhanced with Semantic and Social Information , 2009, Studies in Computational Intelligence.

[117]  Sebastián Ventura,et al.  Educational data mining: A survey from 1995 to 2005 , 2007, Expert Syst. Appl..

[118]  Osamu Watanabe,et al.  Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms , 1999, Data Mining and Knowledge Discovery.

[119]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[120]  Nada Lavrac,et al.  Semantic subgroup discovery: Using ontologies in microarray data analysis , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[121]  Luc De Raedt,et al.  Cluster-grouping: from subgroup discovery to clustering , 2004, Machine Learning.