Explaining Subgroups through Ontologies

Subgroup discovery (SD) methods can be used to find interesting subsets of objects of a given class. Subgroup descriptions (rules) are themselves good explanations of the subgroups. Domain ontologies provide additional descriptions to data and can provide alternative explanations of discovered rules; such explanations in terms of higher level ontology concepts have the potential of providing new insights into the domain of investigation. We show that this additional explanatory power can be ensured by using recently developed semantic SD methods. We present the new approach to explaining subgroups through ontologies and demonstrate its utility on a gene expression profiling use case where groups of patients, identified through SD in terms of gene expression, are further explained through concepts from the Gene Ontology and KEGG orthology.

[1]  Nada Lavrac,et al.  Orange4WS Environment for Service-Oriented Data Mining , 2012, Comput. J..

[2]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[3]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[4]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[5]  Einoshin Suzuki,et al.  Data Mining Methods for Discovering Interesting Exceptions from an Unsupervised Table , 2006, J. Univers. Comput. Sci..

[6]  I. Ellis,et al.  The Nottingham prognostic index in primary breast cancer , 2005, Breast Cancer Research and Treatment.

[7]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[8]  Rafael A Irizarry,et al.  Frozen robust multiarray analysis (fRMA). , 2010, Biostatistics.

[9]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[10]  Blaz Zupan,et al.  Orange: From Experimental Machine Learning to Interactive Data Mining , 2004, PKDD.

[11]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Frank Puppe,et al.  SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery , 2006, PKDD.

[13]  I. Ellis,et al.  Pathological prognostic factors in breast cancer. , 1999, Critical reviews in oncology/hematology.

[14]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2006, Nucleic Acids Res..

[15]  Nada Lavrac,et al.  SEGS: Search for enriched gene sets in microarray data , 2008, J. Biomed. Informatics.

[16]  Hugues Bersini,et al.  inSilicoDb: an R/Bioconductor package for accessing human Affymetrix expert-curated datasets from GEO , 2011, Bioinform..

[17]  Nada Lavrac,et al.  Using Ontologies in Semantic Data Mining with SEGS and g-SEGS , 2011, Discovery Science.

[18]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[19]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[20]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[21]  Johannes Fürnkranz,et al.  Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18-22, 2006, Proceedings , 2006, PKDD.

[22]  Nada Lavrac,et al.  SegMine workflows for semantic microarray data analysis in Orange4WS , 2011, BMC Bioinformatics.

[23]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[24]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[25]  I. Ellis,et al.  Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. , 2002, Histopathology.

[26]  Branko Kavsek,et al.  APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[27]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[28]  Einoshin Suzuki,et al.  Autonomous Discovery of Reliable Exception Rules , 1997, KDD.

[29]  Geoffrey I. Webb,et al.  On detecting differences between groups , 2003, KDD '03.