The SEWASIE MAS for semantic search

The capillary diffusion of the Internet has made available access to an overwhelming amount of data, allowing users having benefit of vast information. However, information is not really directly available: internet data are heterogeneous and spread over different places, with several duplications, and inconsistencies. The integration of such heterogeneous inconsistent data, with data reconciliation and data fusion techniques, may therefore represent a key activity enabling a more organized and semantically meaningful access to data sources. Some issues are to be solved concerning in particular the discovery and the explicit specification of the relationships between abstract data concepts and the need for data reliability in dynamic, constantly changing network. Ontologies provide a key mechanism for solving these challenges, but the web’s dynamic nature leaves open the question of how to manage them. Many solutions based on ontology creation by a mediator system have been proposed: a unified virtual view (the ontology) of the underlying data sources is obtained giving to the users a transparent access to the integrated data sources [1, 2, 3]. The centralized architecture of a mediator system presents several limitations, emphasized in the hidden web [4]: firstly, web data sources hold information according to their particular view of the matter, i.e. each of them uses a specific ontology to represent its data. Also, data sources are usually isolated, i.e. they do not share any topological information concerning the content or structure of other sources. Our proposal is to develop a network of ontology-based mediator systems, where mediators are not isolated from each other and include tools for sharing and mapping their ontologies. In this paper, we describe the use of a multi-agent architecture to achieve and manage the mediators network. The functional architecture is composed of single peers (implemented as

[1]  Andrea Calì,et al.  Data integration under integrity constraints , 2004, Inf. Syst..

[2]  Beneventano Domenico,et al.  Semantic search engines based on data integration systems , 2006 .

[3]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[4]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[5]  Georg Gottlob,et al.  Visual Web Information Extraction with Lixto , 2001, VLDB.

[6]  Gerhard Weiss,et al.  Multiagent systems: a modern approach to distributed artificial intelligence , 1999 .

[7]  Valter Crescenzi,et al.  RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.

[8]  Darrell Woelk,et al.  InfoSleuth: networked exploitation of information using semantic agents , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[9]  Maurizio Vincini,et al.  Synthesizing an Integrated Ontology , 2003, IEEE Internet Comput..

[10]  SALLY McCLEAN,et al.  Agents for Querying Distributed Statistical Databases Over the Internet , 2002, Int. J. Artif. Intell. Tools.

[11]  Peter B. Danzig,et al.  The Harvest Information Discovery and Access System , 1995, Comput. Networks ISDN Syst..

[12]  W. Bruce Croft,et al.  Searching distributed collections with networks , 1995 .

[13]  Divesh Srivastava,et al.  The Information Manifold , 1995 .

[14]  Jennifer Widom,et al.  The TSIMMIS Approach to Mediation: Data Models and Languages , 1997, Journal of Intelligent Information Systems.

[15]  Jussi Myllymaki Effective Web data extraction with standard XML technologies , 2002, Comput. Networks.

[16]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[17]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[18]  Domenico Beneventano,et al.  Fi-nal release of the system prototype for query management , 2005 .

[19]  Silvana Castano,et al.  Global Viewing of Heterogeneous Data Sources , 2001, IEEE Trans. Knowl. Data Eng..