论文信息 - Ontology Learning Through Focused Crawling and Information Extraction

Ontology Learning Through Focused Crawling and Information Extraction

Ontology learning aims to facilitate the construction of ontologies by decreasing the amount of effort required to produce an ontology for a new domain. However, there are few studies that attempt to automate the entire ontology learning process from the collection of domain-specific literature, to text mining to build new ontologies or enrich existing ones. In this paper, we present a complete framework for ontology learning that enables us to retrieve documents from the Web using focused crawling, and then use a SVM (Support Vector Machine) classifier to identify domain-specific documents and perform text mining in order to extract useful information for the ontology enrichment process. We have carried out several experiments on components of this framework in a biological domain, amphibian morphology. This paper reports on the overall system architecture and our initial experiments on information extraction using text mining techniques to enrich the domain ontology.

Qiang Wang | Hiep Phuc Luong | Susan Gauch

[1] Hans-Peter Kriegel,et al. Focused Web Crawling: A Generic Framework for Specifying the User Interest and for Adaptive Crawling Strategies , 2001 .

[2] Qiang Wang,et al. Ontology-Based Focused Crawling , 2009, 2009 International Conference on Information, Process, and Knowledge Management.

[3] Philipp Cimiano,et al. Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[4] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[5] Olatz Ansa,et al. Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.

[6] Jennifer L. Leopold,et al. An Anatomical Ontology for Amphibians , 2006, Pacific Symposium on Biocomputing.

[7] James A. Hendler,et al. The Semantic Web" in Scientific American , 2001 .

[8] Borys Omelayenko,et al. Learning of Ontologies from the Web: the Analysis of Existent Approaches , 2001, WebDyn@ICDT.

[9] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[10] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[11] Shi Bing,et al. Inductive learning algorithms and representations for text categorization , 2006 .

[12] Steffen Staab,et al. Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[13] Martin van den Berg,et al. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[14] Yiming Yang,et al. A scalability analysis of classifiers in text categorization , 2003, SIGIR.

[15] Susan Gauch,et al. Using Text Mining to Enrich the Vocabulary of Domain Ontologies , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.