Discovering mappings in hierarchical data from multiple sources using the inherent structure

Unprecedented amounts of media data are publicly accessible. However, it is increasingly difficult to integrate relevant media from multiple and diverse sources for effective applications. The functioning of a multimodal integration system requires metadata, such as ontologies, that describe media resources and media components. Such metadata are generally application-dependent and this can cause difficulties when media needs to be shared across application domains. There is a need for a mechanism that can relate the common and uncommon terms and media components. In this paper, we develop an algorithm to mine and automatically discover mappings in hierarchical media data, metadata, and ontologies, using the structural information inherent in these types of data. We evaluate the performance of this algorithm for various parameters using both synthetic and real-world data collections and show that the structure-based mining of relationships provides high degrees of precision.

[1]  Tova Milo,et al.  Using Schema Matching to Simplify Heterogeneous Data Translation , 1998, VLDB.

[2]  Divyakant Agrawal,et al.  Query Relaxation by Structure and Semantics for Retrieval of Logical Web Documents , 2002, IEEE Trans. Knowl. Data Eng..

[3]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[4]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[5]  Kaizhong Zhang,et al.  Approximate tree pattern matching , 1997 .

[6]  Hector Garcia-Molina,et al.  Meaningful change detection in structured data , 1997, SIGMOD '97.

[7]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[8]  D. Kendall SHAPE MANIFOLDS, PROCRUSTEAN METRICS, AND COMPLEX PROJECTIVE SPACES , 1984 .

[9]  Bongki Moon,et al.  PRIX: indexing and querying XML using prufer sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[10]  Michael Gertz,et al.  An Efficient XML Node Identification and Indexing Scheme , 2003 .

[11]  K. Selçuk Candan,et al.  Structure-based Mining of Hierarchical Media Data , Meta-Data , and Ontologies , 2004 .

[12]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[13]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[14]  Kaizhong Zhang,et al.  On the Editing Distance Between Undirected Acyclic Graphs , 1996, Int. J. Found. Comput. Sci..

[15]  Stanley M. Selkow,et al.  The Tree-to-Tree Editing Problem , 1977, Inf. Process. Lett..

[16]  Dan Brickley,et al.  Resource description framework (RDF) schema specification , 1998 .

[17]  Kaizhong Zhang,et al.  The editing distance between trees: Algorithms and applications , 1989 .

[18]  K. Selçuk Candan,et al.  Using Random Walks for Mining Web Document Associations , 2000, PAKDD.

[19]  Shin-Yee Lu A Tree-to-Tree Distance and Its Application to Cluster Analysis , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Sudarshan S. Chawathe,et al.  Comparing Hierarchical Data in External Memory , 1999, VLDB.

[21]  Chris Clifton,et al.  Semantic Integration in Heterogeneous Databases Using Neural Networks , 1994, VLDB.

[22]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[23]  Laura M. Haas,et al.  Schema Mapping as Query Discovery , 2000, VLDB.

[24]  Mikkel Thorup,et al.  Sparse Dynamic Programming for Evolutionary-Tree Comparison , 1997, SIAM J. Comput..

[25]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[26]  Luigi Palopoli,et al.  An automatic technique for detecting type conflicts in database schemes , 1998, CIKM '98.

[27]  C. SIAMJ. SPARSE DYNAMIC PROGRAMMING FOR EVOLUTIONARY-TREE COMPARISON , 1997 .

[28]  K. Selçuk Candan,et al.  Discovering Web Document Associations for Web Site Summarization , 2001, WWW Posters.

[29]  Fabrizio Luccio,et al.  Approximate Matching for Two Families of Trees , 1995, Inf. Comput..

[30]  Renée J. Miller,et al.  Schema equivalence in heterogeneous systems: bridging theory and practice , 1994, Inf. Syst..

[31]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[32]  Philip S. Yu,et al.  ViST: a dynamic index method for querying XML data by tree structures , 2003, SIGMOD '03.

[33]  J. Gower Generalized procrustes analysis , 1975 .

[34]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[35]  Prasenjit Mitra,et al.  Semi-automatic Integration of Knowledge Sources , 1999 .

[36]  Eugene W. Myers,et al.  AnO(ND) difference algorithm and its variations , 1986, Algorithmica.

[37]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[38]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[39]  Roy Goldman,et al.  Lore: a database management system for semistructured data , 1997, SGMD.

[40]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[41]  Michael J. Franklin,et al.  A Fast Index for Semistructured Data , 2001, VLDB.

[42]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[43]  Trevor F. Cox,et al.  Nonmetric multidimensional scaling , 2000 .

[44]  Kaizhong Zhang,et al.  A System for Approximate Tree Matching , 1994, IEEE Trans. Knowl. Data Eng..

[45]  Pedro M. Domingos,et al.  Learning Source Descriptions for Data Integration , 2000 .

[46]  Martin L. Kersten,et al.  A Graph-Oriented Model for Articulation of Ontology Interdependencies , 1999, EDBT.

[47]  K. Selçuk Candan,et al.  On Similarity Measures for Multimedia Database Applications , 2001, Knowledge and Information Systems.

[48]  Philip Bille,et al.  Tree Edit Distance, Alignment Distance and Inclusion , 2003 .

[49]  Quanzhong Li,et al.  Indexing and Querying XML Data for Regular Path Expressions , 2001, VLDB.

[50]  Roy Goldman,et al.  DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases , 1997, VLDB.

[51]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[52]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[53]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[54]  Robert Richards,et al.  Document Object Model (DOM) , 2006 .