Concept Capture Based On Column Matching and Clustering

Building ontology from scratch need identify the basic concepts of application domain. In terms of database integration, draft concepts can be directly captured by processing schemas of databases. In this context, we present an automatic approach based on matching and clustering of relational schema columns to capture concepts from relative databases. By combining three individual name matchers following a composite way, the matching phase computes the similarity between column names, which will be used as classifiers for clustering. A neural network matcher is proposed in clustering phase to categorize columns of schemas into clusters by using column constraints with the results from matching phase for joint consideration of multiple criteria. Finally, each concept is defined as a cluster of columns representing the same meaning. The concepts discovered by our approach can be used as draft material or seeds for further comprehensive concept capture.

[1]  Chris Clifton,et al.  SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks , 2000, Data Knowl. Eng..

[2]  Peng Li,et al.  Element matching by concatenating linguistic-based matchers and constraint-based matcher , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[3]  Silvana Castano,et al.  Global Viewing of Heterogeneous Data Sources , 2001, IEEE Trans. Knowl. Data Eng..