Automatic Ontology Extraction with Text Clustering

This paper presents a technique to automatically derive ontologies which is based on hierarchical clustering of document corpora. The procedure applies to a set of texts forming a domain document corpus and creates a hierarchical structure (tree) where at every node is associated a set of terms derived from the document feature vectors. The labeling of the cluster is made by using a new algorithm presented in this work. The derived terms may represent concepts candidate to build a domain taxonomy from which the hierarchical relationships among the classes of the domain ontology can be extracted. To test the technique shown, has been built a propotype tool named (OntoClust).