Heterogeneous Deep Web Data Extraction Using Ontology Evolution

This paper proposed a complex ontology evolution based method of extracting data, and also completely designed an extraction system, which consists of four important components: Resolver, Extractor, Consolidator and the ontology construction components. The system gives priority to the construction of mini-ontology. When the user submits query keywords to the deep web query interface, the returned result will pass through the prior three components; after that, the final execution result will be returned to user in a unified form. This paper adopted an extraction method that is different from the general ontology extraction. More specifically, the ontology used in extraction here is dynamic evolution, which can adapt various data source better. Experimental results proved that this method could effectively extract the data in the query result pages.

[1]  Jeffrey F. Naughton,et al.  Declarative Information Extraction Using Datalog with Embedded Extraction Predicates , 2007, VLDB.

[2]  Eugene Agichtein,et al.  Mining reference tables for automatic text segmentation , 2004, KDD.

[3]  Roni Rosenfeld,et al.  Learning Hidden Markov Model Structure for Information Extraction , 1999 .

[4]  Simona Ronchi Della Rocca,et al.  λ Δ -Models , 2004 .

[5]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[6]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[7]  James S. Aitken Learning Information Extraction Rules: An Inductive Logic Programming approach , 2002, ECAI.

[8]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[9]  B. Huberman,et al.  The Deep Web : Surfacing Hidden Value , 2000 .

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Raymond J. Mooney,et al.  Relational Learning of Pattern-Match Rules for Information Extraction , 1999, CoNLL.

[12]  Rob Malouf,et al.  Markov Models for Language-independent Named Entity Recognition , 2002, CoNLL.

[13]  Sriram Raghavan,et al.  Avatar Information Extraction System , 2006, IEEE Data Eng. Bull..

[14]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[15]  Alicia Ageno,et al.  Adaptive information extraction , 2006, CSUR.

[16]  Anastasia Ailamaki,et al.  Challenges inbuilding a DBMS Resource Advisor , 2006, IEEE Data Eng. Bull..

[17]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.