Be it on a webwide or inter-entreprise scale, data integration has become a major necessity urged by the expansion of the Internet and of its widespread use for communication between business actors. However, since data sources are often heterogeneous, their integration remains an expensive procedure. Indeed, this task requires prior semantic alignment of all the data sources concepts. Doing this alignment manually is quite laborious especially if there is a large number of concepts to be matched. Various solutions have been proposed attempting to automatize this step. This paper introduces a new framework for data sources alignment which integrates context analysis to multi-strategy machine learning. Although their adaptability and extensibility are appreciated, actual machine learning systems often suffer from the low quality and the lack of diversity of training data sets. To overcome this limitation, we introduce a new notion called “informational context” of data sources. We therefore briefly explain the architecture of a context analyser to be integrated into a learning system combining multiple strategies to achieve data source mapping.
[1]
Amihai Motro,et al.
Database Schema Matching Using Machine Learning with Feature Selection
,
2002,
CAiSE.
[2]
Pedro M. Domingos,et al.
Learning to Match the Schemas of Data Sources: A Multistrategy Approach
,
2003,
Machine Learning.
[3]
Pedro M. Domingos,et al.
Learning to map between ontologies on the semantic web
,
2002,
WWW '02.
[4]
Lukasz A. Kurgan,et al.
Semantic Mapping of XML Tags Using Inductive Machine Learning
,
2002,
ICMLA.
[5]
Pedro M. Domingos,et al.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
,
1997,
Machine Learning.
[6]
Ron Kohavi,et al.
Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid
,
1996,
KDD.