Ontology Learning Through Focused Crawling and Information Extraction

Ontology learning aims to facilitate the construction of ontologies by decreasing the amount of effort required to produce an ontology for a new domain. However, there are few studies that attempt to automate the entire ontology learning process from the collection of domain-specific literature, to text mining to build new ontologies or enrich existing ones. In this paper, we present a complete framework for ontology learning that enables us to retrieve documents from the Web using focused crawling, and then use a SVM (Support Vector Machine) classifier to identify domain-specific documents and perform text mining in order to extract useful information for the ontology enrichment process. We have carried out several experiments on components of this framework in a biological domain, amphibian morphology. This paper reports on the overall system architecture and our initial experiments on information extraction using text mining techniques to enrich the domain ontology.

[1]  Hans-Peter Kriegel,et al.  Focused Web Crawling: A Generic Framework for Specifying the User Interest and for Adaptive Crawling Strategies , 2001 .

[2]  Qiang Wang,et al.  Ontology-Based Focused Crawling , 2009, 2009 International Conference on Information, Process, and Knowledge Management.

[3]  Philipp Cimiano,et al.  Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[4]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[5]  Olatz Ansa,et al.  Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.

[6]  Jennifer L. Leopold,et al.  An Anatomical Ontology for Amphibians , 2006, Pacific Symposium on Biocomputing.

[7]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[8]  Borys Omelayenko,et al.  Learning of Ontologies from the Web: the Analysis of Existent Approaches , 2001, WebDyn@ICDT.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[11]  Shi Bing,et al.  Inductive learning algorithms and representations for text categorization , 2006 .

[12]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[13]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[14]  Yiming Yang,et al.  A scalability analysis of classifiers in text categorization , 2003, SIGIR.

[15]  Susan Gauch,et al.  Using Text Mining to Enrich the Vocabulary of Domain Ontologies , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.