Ontology Learning from Textual Web Documents

Domain ontology plays an important role in annotating web resources with proper semantic information. The underlying assumption behind this work is that the noun phrases appearing in the headings of a document as well as the document’s hierarchical structure can be used to discover the concepts and is-a relations between them in the documents’ domain. In order to verify this assumption a methodology was proposed, and a system was implemented and applied on a set of Arabic agricultural extension documents. The system takes as input a root concept, analyzes all input documents’ heading structure, extracts concepts from headings and builds a taxonomical ontology. The resulting ontology was verified against a modified version of AGROVOC ontology, which is a hand-made ontology developed by Food and Agriculture Organization of the United Nation (FAO). The F-score obtained was 52.29% for lexical evaluation of diseases ontology and 39.64% for lexical evaluation of insects' ontology. Taxonomical F-score was 44.59% for diseases ontology and 31.38% for insects' ontology.