Two Phase User Driven Schema Matching

In recent years it has become apparent that schema matching is a labor intensive process that is very costly in resources; this has led to the development of various automated tools to substitute the human experts involved in it. To this end we propose two new ideas. The first is the separation of matching techniques into strong and weak ones, in what we call two phase schema matching. The second is using information a human expert can provide to the system during the process of schema matching, that is used to determine how to combine the various matching techniques. A system encompassing both our ideas is easily tunable and allows the human expert to become part of the matching process and help the system choose the best techniques to use. In extensive experiments we demonstrate that this approach is better than contemporary state of the art systems in relational databases. We also demonstrate that single purpose (or niche) matchers can be helpful in such a system where the system can opt to use them if appropriate.

[1]  Erhard Rahm,et al.  COMA++: Results for the Ontology Alignment Contest OAEI 2006 , 2006, Ontology Matching.

[2]  Erhard Rahm,et al.  Matching large schemas: Approaches and evaluation , 2007, Inf. Syst..

[3]  Pradeep Ravikumar,et al.  Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..

[4]  Ahmed K. Elmagarmid,et al.  U-MAP: a system for usage-based schema matching and mapping , 2011, SIGMOD '11.

[5]  AnHai Doan,et al.  Corpus-based schema matching , 2005, 21st International Conference on Data Engineering (ICDE'05).

[6]  Erhard Rahm,et al.  Generic schema matching, ten years later , 2011, Proc. VLDB Endow..

[7]  Pedro M. Domingos,et al.  Learning to Match the Schemas of Data Sources: A Multistrategy Approach , 2003, Machine Learning.

[8]  Bernard A. Nadel,et al.  Representation selection for constraint satisfaction: a case study using n-queens , 1990, IEEE Expert.

[9]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[10]  Ken Samuel,et al.  Integration Workbench: Integrating Schema Integration Tools , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[11]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[12]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[13]  Karl Aberer,et al.  Pay-as-you-go reconciliation in schema matching networks , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[14]  AnHai Doan,et al.  Matching Schemas in Online Communities: A Web 2.0 Approach , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  William W. Cohen,et al.  Joins that Generalize: Text Classification Using WHIRL , 1998, KDD.

[16]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[17]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[18]  Jayant Madhavan,et al.  OpenII: an open source information integration toolkit , 2010, SIGMOD Conference.

[19]  Eric Peukert,et al.  A Self-Configuring Schema Matching System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[20]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[21]  Zohra Bellahsene,et al.  (Not) yet another matcher , 2009, CIKM.

[22]  Avigdor Gal,et al.  In schema matching, even experts are human: Towards expert sourcing in schema matching , 2014, 2014 IEEE 30th International Conference on Data Engineering Workshops.

[23]  Lei Chen,et al.  Reducing Uncertainty of Schema Matching via Crowdsourcing , 2013, Proc. VLDB Endow..

[24]  Arnon Rosenthal,et al.  The Harmony Integration Workbench , 2008, J. Data Semant..