Semantic Information Generation from Classification and Information Extraction

This paper presents MASTERWeb, a multi-agent system for classification and information extraction from Web pages. The multi-agent approach allows that agents, specialized in the different page classes of a cluster, share common information through a cooperation process. The goal of the system is to provide the user with information that is less noisy and more focused in his interests. To represent the domain knowledge, the system uses ontologies and frames [4]. The extraction module explores implicit structures of the page class to extract the information efficiently. It consists of an expert system in which the knowledge is stored using ontologies. MASTERWeb is a cognitive multi-agent system for integrated manipulation of information where each agent has the responsibility for the classification of the page contents inside a knowledge domain [2]. The MASTERWeb system is based on the principle that some page classes may be interrelated, for instance, instances of the page class “scientific events” may contain information or links to “researchers” page class through the attribute “chairman of the event”.