Measuring the Specificity of Terms for Automatic Hierarchy Construction

This paper introduces new specificity measuring methods of terms using compositional and compositional information. Specificity of a term is the quantity of domain specific information contained in the term. Specific terms have larger quantity of domain information than general terms. Specificity is an important necessary condition for building hierarchical relations among terms. If X1 is a descendant of X2, then the specificity of X1 is greater than that of X2. As domain specific terms are commonly compounds of the generic level term and some modifiers, compositional information is important to represent the meaning of terms. Contextual information is also used to mitigate the shortcomings of compositional information. Because information theory constitutes a well known formalism for describing information, we adopt the mechanism to measure the information quantity of terms. As the proposed methods do not use domain specific information, they can be applied to other domains without extra processes. Experiments showed very promising results with a precision of 82.0% when applied to terms in the MeSH thesaurus.