Using Ontologies in Semantic Data Mining with SEGS and g-SEGS

With the expanding of the SemanticWeb and the availability of numerous ontologies which provide domain background knowledge and semantic descriptors to the data, the amount of semantic data is rapidly growing. The data mining community is faced with a paradigm shift: instead of mining the abundance of empirical data supported by the background knowledge, the new challenge is to mine the abundance of knowledge encoded in domain ontologies, constrained by the heuristics computed from the empirical data collection. We address this challenge by an approach, named semantic data mining, where domain ontologies define the hypothesis search space, and the data is used as means of constraining and guiding the process of hypothesis search and evaluation. The use of prototype semantic data mining systems SEGS and g-SEGS is demonstrated in a simple semantic data mining scenario and in two reallife functional genomics scenarios of mining biological ontologies with the support of experimental microarray data.

[1]  Laurent Brisson,et al.  How to Semantically Enhance a Data Mining Process? , 2008, ICEIS.

[2]  Foster J. Provost,et al.  Exploiting Background Knowledge in Automated Discovery , 1996, KDD.

[3]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[4]  Jan Rauch,et al.  Ontology-Enhanced Association Mining , 2005, EWMF/KDO.

[5]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Nada Lavrac,et al.  SEGS: Search for enriched gene sets in microarray data , 2008, J. Biomed. Informatics.

[7]  Gemma C. Garriga,et al.  Feature Selection in Taxonomies with Applications to Paleontology , 2008, Discovery Science.

[8]  Dunja Mladenic,et al.  Semantics, Web and Mining , 2008 .

[9]  Luc De Raedt,et al.  Logical and relational learning , 2008, Cognitive Technologies.

[10]  Ian Witten,et al.  Data Mining , 2000 .

[11]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[12]  Blaz Zupan,et al.  Orange: From Experimental Machine Learning to Interactive Data Mining , 2004, PKDD.

[13]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[14]  Jens Lehmann,et al.  Ideal Downward Refinement in the EL Description Logic , 2009, ILP.

[15]  Behrang QasemiZadeh Towards technology structure mining from scientific literature , 2010, ISWC 2010.

[16]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[18]  Foster J. Provost,et al.  RL4: a tool for knowledge-based induction , 1990, [1990] Proceedings of the 2nd International IEEE Conference on Tools for Artificial Intelligence.

[19]  Stephen G. MacDonell,et al.  An ontology driven approach for knowledge discovery in Biomedicine , 2004 .

[20]  Nada Lavrac,et al.  Bisociative Knowledge Discovery for Microarray Data Analysis , 2010, ICCC.

[21]  Haishan Liu,et al.  Towards Semantic Data Mining , 2010 .

[22]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[23]  Luc De Raedt,et al.  Logical and Relational Learning: From ILP to MRDM (Cognitive Technologies) , 2008 .