论文信息 - A Proposal for the Automatic Generation of Instances from Unstructured Text

A Proposal for the Automatic Generation of Instances from Unstructured Text

An ontology is a conceptual representation of a domain resulted from a consensus within a community. One of its main applications is the integration of heterogeneous information sources available in the Web, by means of the semantic annotation of web documents. This is the cornerstone of the emerging Semantic Web. However, nowadays most of the information in the Web consists of text documents with little or no structure at all, which makes impracticable their manual annotation. This paper addresses the problem of mapping text fragments into a given ontology in order to generate ontology instances that semantically describe this kind of resources. As a result, applying this mapping we can automatically populate a Semantic Web consisting of text documents that concern with a specific ontology. We have evaluated our approach over a real-application ontology and a text collection both in the Archeology domain. Results show the effectiveness of the method as well as its usefulness.

Rafael Berlanga Llavori | Roxana Dánger Mercaderes | José Ruiz-Shulcloper | Ismael Sanz

[1] Steffen Staab,et al. Bootstrapping an Ontology-Based Information Extraction System , 2003, Intelligent Exploration of the Web.

[2] James A. Hendler,et al. The Semantic Web" in Scientific American , 2001 .

[3] Thomas R. Gruber,et al. Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[4] Pedro M. Domingos,et al. Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[5] Rafael Berlanga Llavori,et al. Text Mining Using the Hierarchical Syntactical Structure of Documents , 2003, CAEPIA.

[6] Laura Farinetti,et al. Can Data Mining Techniques Ease The Semantic Tagging Burden? , 2003, SWDB.

[7] Douglas E. Appelt,et al. Introduction to Information Extraction , 1999, AI Commun..