Why is schema matching tough and what can we do about it?

In this paper we analyze the problem of schema matching, explain why it is such a "tough" problem and suggest directions for handling it effectively. In particular, we present the monotonicity principle and see how it leads to the use of top-K mappings rather than a single mapping.

[1]  Arnon Rosenthal,et al.  Tuning Schema Matching Software using Synthetic Scenarios , 2005, VLDB.

[2]  Gustavo Alonso,et al.  SwissQM: Next Generation Data Processing in Sensor Networks , 2007, CIDR.

[3]  Ronald Fagin,et al.  Normal forms and relational database operators , 1979, SIGMOD '79.

[4]  Marianne Winslett,et al.  Trustworthy keyword search for regulatory-compliant records retention , 2006, VLDB.

[5]  Dimitrios Gunopulos,et al.  Nearest Neighbor Queries in a Mobile Environment , 1999, Spatio-Temporal Database Management.

[6]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[7]  Felix Naumann,et al.  Schema matching using duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  Hans-Peter Kriegel,et al.  The Buddy-Tree: An Efficient and Robust Access Method for Spatial Data Base Systems , 1990, VLDB.

[9]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[10]  Maurizio Lenzerini,et al.  Data Integration Is Harder than You Thought , 2001, CoopIS.

[11]  Pedro M. Domingos,et al.  Representing and reasoning about mappings between domain models , 2002, AAAI/IAAI.

[12]  Millist W. Vincent,et al.  Semantic foundations of 4NF in relational database design , 1999, Acta Informatica.

[13]  Marcel Kornacker,et al.  High-Concurrency Locking in R-Trees , 1995, VLDB.

[14]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[15]  Marcelo Arenas,et al.  An information-theoretic approach to normal forms for relational and XML data , 2003, PODS.

[16]  Andrew Rau-Chaplin,et al.  Parallel querying of ROLAP cubes in the presence of hierarchies , 2005, DOLAP '05.

[17]  Marcelo Arenas,et al.  Design principles for XML data , 2005 .

[18]  Philip A. Bernstein,et al.  Meta data management , 2004, Proceedings. 20th International Conference on Data Engineering.

[19]  Solmaz Kolahi,et al.  Dependency-preserving normalization of relational and XML data , 2007, J. Comput. Syst. Sci..

[20]  Alfredo Cuzzocrea Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP , 2005, DOLAP '05.

[21]  Solmaz Kolahi,et al.  On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF , 2006, PODS '06.

[22]  Avigdor Gal,et al.  On the Cardinality of Schema Matching , 2005, OTM Workshops.

[23]  Beng Chin Ooi,et al.  Frequent update and efficient retrieval: an oxymoron on moving object indexes? , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering (Workshops), 2002..

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  Torben Bach Pedersen,et al.  A relevance-extended multi-dimensional model for a data warehouse contextualized with documents , 2005, DOLAP '05.

[26]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[27]  Ken Q. Pu Modeling, querying and reasoning about OLAP databases: a functional approach , 2005, DOLAP '05.

[28]  Christian S. Jensen,et al.  Indexing the positions of continuously moving objects , 2000, SIGMOD '00.

[29]  Avigdor Gal,et al.  Managing Uncertainty in Schema Matching with Top-K Schema Mappings , 2006, J. Data Semant..

[30]  Tok Wang Ling,et al.  Designing Functional Dependencies for XML , 2002, EDBT.

[31]  Serge Abiteboul,et al.  Representing and querying XML with incomplete information , 2006, TODS.

[32]  Laurian M. Chirica,et al.  The entity-relationship model: toward a unified view of data , 1975, SIGF.

[33]  Wenfei Fan,et al.  Integrity constraints for XML , 2000, PODS.

[34]  Sergey Melnik,et al.  Generic Model Management: Concepts And Algorithms (Lecture Notes in Computer Science) , 2004 .

[35]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[36]  Nick Roussopoulos,et al.  Hashing Moving Objects , 2001, Mobile Data Management.

[37]  Weifeng Su,et al.  Holistic Schema Matching for Web Query Interfaces , 2006, EDBT.

[38]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[39]  Karl Aberer,et al.  A middleware for fast and flexible sensor network deployment , 2006, VLDB.

[40]  Marcelo Arenas,et al.  A normal form for XML documents , 2004, TODS.

[41]  Mark Levene,et al.  Justification for Inclusion Dependency Normal Form , 2000, IEEE Trans. Knowl. Data Eng..

[42]  Sandro Bimonte,et al.  Towards a spatial multidimensional model , 2005, DOLAP '05.

[43]  Chengfei Liu,et al.  A Redundancy Free 4NF for XML , 2003, Xsym.

[44]  Ronald Fagin,et al.  Multivalued dependencies and a new normal form for relational databases , 1977, TODS.

[45]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[46]  Ronald Fagin,et al.  A normal form for relational databases that is based on domains and keys , 1981, TODS.

[47]  Kevin Chen-Chuan Chang,et al.  Making holistic schema matching robust: an ensemble approach , 2005, KDD '05.

[48]  Frank Neven,et al.  DTDs versus XML schema: a practical study , 2004, WebDB '04.

[49]  Arnaud Giacometti,et al.  A personalization framework for OLAP queries , 2005, DOLAP '05.

[50]  Chengfei Liu,et al.  Strong functional dependencies and their application to normal forms in XML , 2004, TODS.

[51]  Dan Suciu,et al.  On database theory and XML , 2001, SGMD.

[52]  Sergey Melnik,et al.  Generic Model Management , 2004, Lecture Notes in Computer Science.

[53]  Eugene Goldberg,et al.  BerkMin: A Fast and Robust Sat-Solver , 2002, Discret. Appl. Math..

[54]  Gustavo Alonso,et al.  Efficient Sharing of Sensor Networks , 2006, 2006 IEEE International Conference on Mobile Ad Hoc and Sensor Systems.

[55]  Paolo Bouquet,et al.  Soundness of Schema Matching Methods , 2005, ESWC.

[56]  Myoung-Ho Kim,et al.  Optimizing the incremental maintenance of multiple join views , 2005, DOLAP '05.

[57]  Erhard Rahm,et al.  Comparison of Schema Matching Evaluations , 2002, Web, Web-Services, and Database Systems.

[58]  Bernhard Convent,et al.  Unsolvable Problems Related To The View Integration Approach , 1986, ICDT.

[59]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[60]  Katta G. Murty,et al.  Letter to the Editor - An Algorithm for Ranking all the Assignments in Order of Increasing Cost , 1968, Oper. Res..

[61]  Divyakant Agrawal,et al.  Using space-time grid for efficient management of moving objects , 2001, MobiDe '01.

[62]  E. F. Codd,et al.  Recent Investigations in Relational Data Base Systems , 1974, ACM Pacific.

[63]  Alon Y. Halevy,et al.  Updating XML , 2001, SIGMOD '01.

[64]  Kian-Lee Tan,et al.  Towards Optimal Utilization of Main Memory for Moving Object Indexing , 2005, DASFAA.

[65]  David W. Embley,et al.  Developing XML Documents with Guaranteed "Good" Properties , 2001, ER.

[66]  A Min Tjoa,et al.  Sense & response service architecture (SARESA): an approach towards a real-time business intelligence solution and its use for a fraud detection application , 2005, DOLAP '05.

[67]  Wenfei Fan,et al.  Consistency of XML Specifications , 2005, Inconsistency Tolerance.

[68]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[69]  Matteo Magnani,et al.  Schema Integration Based on Uncertain Semantic Mappings , 2005, ER.

[70]  E. F. Codd,et al.  Further Normalization of the Data Base Relational Model , 1971, Research Report / RJ / IBM / San Jose, California.

[71]  Hao Zhong-xiao Modeling and querying moving objects , 2005 .

[72]  Sharad Malik,et al.  Chaff: engineering an efficient SAT solver , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[73]  Beng Chin Ooi,et al.  Query and Update Efficient B+-Tree Based Indexing of Moving Objects , 2004, VLDB.

[74]  K. G. Murty An Algorithm for Ranking All the Assignment in Order of Increasing Cost , 1968 .

[75]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[76]  Mario Piattini,et al.  Applying MDA to the development of data warehouses , 2005, DOLAP '05.

[77]  Avigdor Gal,et al.  A framework for modeling and evaluating automatic semantic reconciliation , 2005, The VLDB Journal.

[78]  Paolo Giorgini,et al.  Goal-oriented requirement analysis for data warehouse design , 2005, DOLAP '05.

[79]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[80]  Alkis Simitsis,et al.  Mapping conceptual to logical models for ETL processes , 2005, DOLAP '05.

[81]  Il-Yeol Song,et al.  Dimensional modeling: identifying, classifying & applying patterns , 2005, DOLAP '05.