论文信息 - Semiautomatic Generation of Data-Extraction Ontologies from Relational Databases

Semiautomatic Generation of Data-Extraction Ontologies from Relational Databases

Data extraction is the process used to gather and structure information in documents (e.g.Web pages). One approach to data extraction is the so-called ontology based data extraction. In this approach, an ontology is used as a guide to the parser that extracts data from the source documents. In this context, an ontology is a conceptual schema enriched with information needed to identify data items in the sources. The process of creation of an ontology is not a trivial task and may require the analysis of a big number of document instances. However, in many extraction applications, the information that is being extracted may already be modeled in a relational database. In this case, the relational database schema can be used as a startingpoint to the construction of a data extraction ontology. Analysis of data instances stored in the database may help to generate the information used to parse data items in document sources. This paper presents a method for the semiautomatic creation of a data extraction ontology. This process is based on reverse engineering of the relational database schema combined with the analysis of data instances.

Carlos Alberto Heuser | Orlando Miguel Vivan | C. Heuser

[1] Donald E. Knuth,et al. The art of computer programming: sorting and searching (volume 3) , 1973 .

[2] Berthier A. Ribeiro-Neto,et al. A brief survey of web data extraction tools , 2002, SGMD.

[3] Alberto O. Mendelzon,et al. Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[4] Robert J. Schalkoff,et al. Pattern recognition - statistical, structural and neural approaches , 1991 .

[5] François Denis,et al. Learning Regular Languages from Simple Positive Examples , 2001, Machine Learning.

[6] Shamkant B. Navathe,et al. Conceptual Database Design: An Entity-Relationship Approach , 1991 .

[7] Donald E. Knuth,et al. The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[8] Michael G. Thomason,et al. Syntactic Pattern Recognition, An Introduction , 1978, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Hector Garcia-Molina,et al. Semistructured Data: The Tsimmis Experience , 1997, ADBIS.

[10] Jian-Yun Nie. Heterogeneous Web Data Extraction using Ontology , 2001 .

[11] David W. Embley. Object database development - concepts and principles , 1997 .

[12] David W. Embley,et al. Ontology-based extraction and structuring of information from data-rich unstructured documents , 1998, CIKM '98.

[13] Ricardo Baeza-Yates,et al. Information Retrieval: Data Structures and Algorithms , 1992 .

[14] Venkata Subramaniam,et al. Information Retrieval: Data Structures & Algorithms , 1992 .

[15] Steven Feuerstein,et al. Oracle PL/SQL Programming , 1993 .

[16] Alberto H. F. Laender,et al. DEByE - Uma ferramenta para Extração de Dados Semi-Estruturados , 1999, SBBD.