A significant amount of data on the Web is in the XML format or may easily be converted to XML or to its variations. XML is still the most appropriate language for data interchange and serialization. In this paper, a new framework which can integrate any heterogeneous XML data sources is presented. Each data source is translated into semantically meaningful regular expressions without changing original data source. Proposed framework has two major phases for data preparation. In the first phase, each data source is processed to obtain regular expressions which accommodate with the design choices that made in target by utilizing known global semantic vocabulary as an input. The second phase combines these regular expressions to get a global schema by preserving the original source data. A regular expression generator tool which produces regular expressions by regarding vocabulary and an integrator tool box which integrates and processes regular expressions, are also introduced.
[1]
Ross Brennan,et al.
Business-to-Business Marketing
,
2007,
Encyclopedia of Social Network Analysis and Mining.
[2]
James Clark,et al.
XSL Transformations (XSLT) Version 1.0
,
1999
.
[3]
Oren Etzioni,et al.
Crossing the Structure Chasm
,
2003,
CIDR.
[4]
Ronald Fagin,et al.
Translating Web Data
,
2002,
VLDB.
[5]
Geert-Jan Houben,et al.
RDF-Based Architecture for Semantic Integration of Heterogeneous Information Sources
,
2001,
Workshop on Information Integration on the Web.
[6]
Sree Nilakanta,et al.
Implementation of Electronic Data Interchange: An Innovation Diffusion Perspective
,
1994,
J. Manag. Inf. Syst..
[7]
Catriel Beeri,et al.
Ontology-Based Integration of XML Web Resources
,
2002,
SEMWEB.
[8]
P. Naudé,et al.
Business to Business Marketing
,
2011
.