Implementation of Two-step Clustering with Self-organizing Maps for Semantic Integration of Heterogeneous Data Sources

This paper presents a new clustering technique for the semantic integration of heterogeneous databases using two-step hierarchical clustering and agglomerative clustering with self-organizing maps(SOM). Two-step clustering computes the number of clusters automatically and improves the clustering result of SOM. Data preprocessing is performed prior to clustering to produce an input range [0,1] out of database schema and records. The clustering results was improved significantly using two-step clustering with SOM, which shows more precise similarity among attributes of databases from heterogeneous data sources.