Towards a semantic-driven automatic staging area design for heterogeneous data integration

Nowadays, the volume of information increases exponentially, forcing the corporations to keep their business information distributed under several heterogeneous sources such as relational databases, spread sheets, XML documents and Web pages, and stored under different structures and formats. Integrating heterogeneous sources is recently acknowledged as an important vision on semantic web research. The concept of heterogeneity arises at different levels: from the lexical level to the semantic or structural level. For discovering and consolidating the semantic relationships among the semantically related data present in different types of databases and files, this paper presents the enhancements obtained due to the use of available online large lexical databases, combined with lexical and structural similarity models and the available source metadata. Finally, we reveal the experimental results that demonstrate the applicability and usability of our approach.