An Exploration Of Understanding Heterogeneity Through Data Mining

Development of internet and Web have resulted in many distributed information resources which in general are structurally and semantically heterogeneous even in the same domain. However, heterogeneity itself has not been studied in a formal way so that the representation of different kinds of heterogeneities can be generically processed by other programs automatically. Most descriptions and categorization schemes of heterogeneities were given in languages specific to different research groups. We believe that efforts invested in a thorough research of heterogeneity can ultimately benefit both data integration and data mining communities. In this paper we give a brief survey of various ways to categorize heterogeneity in the literature, and then performed a case study on detecting a specific class of heterogeneity in the setting of Semantic Web ontologies‐the one that can be discovered by only data-driven approaches. Finally we propose an automatic ontology matching system that can detect this heterogeneity by using redescription mining techniques. We also believe that automatic ontology matching process is a helpful step in tasks of mining multiple information sources in the heterogeneous scenario.

[1]  Doug Fang,et al.  The identification and resolution of semantic heterogeneity in multidatabase systems , 1991, [1991] Proceedings. First International Workshop on Interoperability in Multidatabase Systems.

[2]  Vasant Honavar,et al.  Algorithms and Software for Collaborative Discovery from Autonomous, Semantically Heterogeneous, Distributed Information Sources , 2005, Discovery Science.

[3]  Heiner Stuckenschmidt,et al.  Ontology-Based Integration of Information - A Survey of Existing Approaches , 2001, OIS@IJCAI.

[4]  Michael Stonebraker,et al.  THALIA: Test Harness for the Assessment of Legacy Information Integration Approaches , 2005, 21st International Conference on Data Engineering (ICDE'05).

[5]  Sergio Tessaris,et al.  Extracting Ontologies from Relational Databases , 2007, Description Logics.

[6]  Natalya F. Noy,et al.  Semantic integration: a survey of ontology-based approaches , 2004, SGMD.

[7]  Mark A. Musen,et al.  PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment , 2000, AAAI/IAAI.

[8]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[9]  Joachim Hammer,et al.  A Classification Scheme for Semantic and Schematic Heterogeneities in XML Data Sources , 2000 .

[10]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[11]  Pedro M. Domingos,et al.  Learning to map between ontologies on the semantic web , 2002, WWW '02.

[12]  AnHai Doan,et al.  iMAP: Discovering Complex Mappings between Database Schemas. , 2004, SIGMOD 2004.

[13]  Boris Motik,et al.  Bridging the gap between OWL and relational databases , 2007, WWW '07.

[14]  Dennis McLeod,et al.  The Identification and Resolution of Semantic Heterogeneity , 1991 .

[15]  Jos de Bruijn,et al.  Towards an Ontology Mapping Specification Language for the Semantic Web , 2004 .

[16]  Naren Ramakrishnan,et al.  Redescription Mining: Structure Theory and Algorithms , 2005, AAAI.

[17]  Naren Ramakrishnan,et al.  Reasoning about sets using redescription mining , 2005, KDD '05.

[18]  Stuart E. Madnick,et al.  Representing and reasoning about semantic conflicts in heterogeneous information systems , 1997 .

[19]  Deborah L. McGuinness,et al.  The Chimaera Ontology Environment , 2000, AAAI/IAAI.

[20]  Grigorios Tsoumakas,et al.  Distributed Data Mining , 2009, Encyclopedia of Data Warehousing and Mining.

[21]  Yuzhong Qu,et al.  Block Matching for Ontologies , 2006, SEMWEB.

[22]  Deept Kumar,et al.  Turning CARTwheels: an alternating algorithm for mining redescriptions , 2003, KDD.

[23]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[24]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[25]  Jungyun Seo,et al.  Classifying schematic and data heterogeneity in multidatabase systems , 1991, Computer.

[26]  Peishen Qi,et al.  Ontology Translation on the Semantic Web , 2003, J. Data Semant..

[27]  Raphael Volz,et al.  Migrating data-intensive web sites into the Semantic Web , 2002, SAC '02.

[28]  Hillol Kargupta,et al.  Distributed Data Mining: Algorithms, Systems, and Applications , 2003 .

[29]  Dejing Dou,et al.  Discovering Executable Semantic Mappings Between Ontologies , 2007, OTM Conferences.