Schema matching over relations, attributes, and data values

Automatic schema matching algorithms are typically only concerned with finding attribute correspondences. However, real world data integration problems often require matchings whose arguments span all three types of elements in relational databases: relation, attribute and data value. This paper introduces the definitions and semantics of three additional correspondence types concerning both schema and data values. These correspondences cover the higher-order mappings identified in a seminal paper by Krishnamurthy, Litwin, and Kent. It is shown that these correspondences can be automatically translated to tuple generating dependencies (tgds), and thus this research is compatible with data integration applications that leverage tgds. Two methods for automatically identifying these correspondences are developed. One requires a limited number of duplicates across data sources. The other is a general instance-based method with no such requirement. Experiments conducted on four real world data sets demonstrate the effectiveness of the methods.

[1]  Jinling Song,et al.  Discovering Complex Semantic Matches between Database Schemas , 2009, 2009 International Conference on Web Information Systems and Mining.

[2]  ChristenPeter A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012 .

[3]  Li Qian,et al.  Sample-driven schema mapping , 2012, SIGMOD Conference.

[4]  Daniel P. Miranker,et al.  An Unsupervised Algorithm for Learning Blocking Schemes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[5]  Alon Y. Halevy,et al.  Principles of Data Integration , 2012 .

[6]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Eric Peukert,et al.  A Self-Configuring Schema Matching System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[8]  Daniel P. Miranker,et al.  SPHINX: Schema integration by example , 2007, Journal of Intelligent Information Systems.

[9]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[10]  Pedro M. Domingos,et al.  Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[11]  Erhard Rahm,et al.  Schema Matching and Mapping , 2013, Schema Matching and Mapping.

[12]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[13]  David Maier,et al.  Testing implications of data dependencies , 1979, SIGMOD '79.

[14]  Marcelo Arenas,et al.  Relational and XML Data Exchange , 2010, Relational and XML Data Exchange.

[15]  Pedro M. Domingos,et al.  Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[16]  Felix Naumann,et al.  Schema matching using duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  Divesh Srivastava,et al.  Record linkage: similarity measures and algorithms , 2006, SIGMOD Conference.

[18]  Ravi Krishnamurthy,et al.  Language features for interoperability of databases with schematic discrepancies , 1991, SIGMOD '91.

[19]  Lei Chen,et al.  Reducing Uncertainty of Schema Matching via Crowdsourcing , 2013, Proc. VLDB Endow..

[20]  Laura M. Haas,et al.  Clio: Schema Mapping Creation and Data Exchange , 2009, Conceptual Modeling: Foundations and Applications.

[21]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[22]  HalevyAlon,et al.  MiniCon: A scalable algorithm for answering queries using views , 2001, VLDB 2001.

[23]  Todd D. Millstein,et al.  Navigational Plans For Data Integration , 1999, AAAI/IAAI.

[24]  Erhard Rahm,et al.  Generic schema matching, ten years later , 2011, Proc. VLDB Endow..

[25]  Daniel P. Miranker,et al.  QODI: Query as Context in Automatic Data Integration , 2013, International Semantic Web Conference.

[26]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[27]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[28]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.