A Collective, Probabilistic Approach to Schema Mapping Using Diverse Noisy Evidence

We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of schema mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings as well as inconsistencies and incompleteness in the input. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using state-of-the-art probabilistic reasoning techniques. Our evaluation on a wide range of integration scenarios, including several real-world domains, demonstrates that CMD effectively combines data and metadata information to infer highly accurate mappings even with significant levels of noise.

[1]  Zohra Bellahsene,et al.  On Evaluating Schema Matching and Mapping , 2011, Schema Matching and Mapping.

[2]  Jennifer Widom,et al.  Synthesizing view definitions from data , 2010, ICDT '10.

[3]  James R. Foulds,et al.  HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems , 2015, RecSys.

[4]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[5]  Laura M. Haas,et al.  Clio: Schema Mapping Creation and Data Exchange , 2009, Conceptual Modeling: Foundations and Applications.

[6]  Renée J. Miller,et al.  A Collective, Probabilistic Approach to Schema Mapping , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[7]  Kristina Lerman,et al.  Semi-automatically Mapping Structured Sources into the Semantic Web , 2012, ESWC.

[8]  Phokion G. Kolaitis,et al.  Structural characterizations of schema-mapping languages , 2009, ICDT '09.

[9]  Avigdor Gal,et al.  Managing Uncertainty in Schema Matching with Top-K Schema Mappings , 2006, J. Data Semant..

[10]  Divesh Srivastava,et al.  Less is More: Selecting Sources Wisely for Integration , 2012, Proc. VLDB Endow..

[11]  Charles Audet,et al.  Mesh Adaptive Direct Search Algorithms for Constrained Optimization , 2006, SIAM J. Optim..

[12]  Angela Bonifati,et al.  Schema mapping verification: the spicy way , 2008, EDBT '08.

[13]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[14]  Laura M. Haas,et al.  Data-driven understanding and refinement of schema mappings , 2001, SIGMOD '01.

[15]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[16]  Phokion G. Kolaitis Schema mappings and data examples , 2011, LID '11.

[17]  Li Qian,et al.  Sample-driven schema mapping , 2012, SIGMOD Conference.

[18]  Paolo Papotti,et al.  IQ-METER - An evaluation tool for data-transformation systems , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[19]  Karl Aberer,et al.  Pay-as-you-go reconciliation in schema matching networks , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[20]  GetoorLise,et al.  Hinge-loss Markov random fields and probabilistic soft logic , 2017 .

[21]  Renée J. Miller,et al.  The iBench Integration Metadata Generator , 2015, Proc. VLDB Endow..

[22]  Phokion G. Kolaitis,et al.  EIRENE: Interactive Design and Refinement of Schema Mappings via Data Examples , 2011, Proc. VLDB Endow..

[23]  Phokion G. Kolaitis,et al.  Learning schema mappings , 2012, ICDT '12.

[24]  Daniel S. Weld,et al.  Ontological Smoothing for Relation Extraction with Minimal Supervision , 2012, AAAI.

[25]  Renée J. Miller,et al.  Muse: Mapping Understanding and deSign by Example , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[26]  Phokion G. Kolaitis,et al.  Approximation Algorithms for Schema-Mapping Discovery from Data Examples , 2015, AMW.

[27]  Laks V. S. Lakshmanan,et al.  HePToX: Marrying XML and Heterogeneity in Your P2P Databases , 2005, VLDB.

[28]  Ronald Fagin,et al.  Translating Web Data , 2002, VLDB.

[29]  Ahmed K. Elmagarmid,et al.  Leveraging query logs for schema mapping generation in U-MAP , 2011, SIGMOD '11.

[30]  Lise Getoor,et al.  Knowledge Graph Identification , 2013, SEMWEB.

[31]  Jeffrey F. Naughton,et al.  On schema matching with opaque column names and data values , 2003, SIGMOD '03.

[32]  Paolo Papotti,et al.  What is the IQ of your data transformation system? , 2012, CIKM.

[33]  Paolo Papotti,et al.  ++Spicy: an OpenSource Tool for Second-Generation Schema Mapping and Data Exchange , 2011, Proc. VLDB Endow..

[34]  Partha Pratim Talukdar,et al.  Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration , 2013, Proc. VLDB Endow..

[35]  Norman W. Paton,et al.  Incrementally improving dataspaces based on user feedback , 2013, Inf. Syst..

[36]  Heiner Stuckenschmidt,et al.  A Probabilistic-Logical Framework for Ontology Matching , 2010, AAAI.

[37]  Christian Becker,et al.  Extending SMW+ with a Linked Data Integration Framework , 2010, ISWC Posters&Demos.

[38]  Phokion G. Kolaitis,et al.  Designing and refining schema mappings via data examples , 2011, SIGMOD '11.

[39]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[40]  Louiqa Raschid,et al.  Ieee/acm Transactions on Computational Biology and Bioinformatics 1 Network-based Drug-target Interaction Prediction with Probabilistic Soft Logic , 2022 .

[41]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[42]  Denilson Barbosa,et al.  ToXgene: a template-based data generator for XML , 2002, SIGMOD '02.

[43]  Georg Gottlob,et al.  Schema mapping discovery from data instances , 2010, JACM.