Determining the specificity of terms using inside-outside information: a necessary condition of term hierarchy mining

This paper introduces new specificity measuring methods of terms using inside and outside information. Specificity of a term is the quantity of domain specific information contained in the term. Specific terms have a larger quantity of domain information than general terms. Specificity is an important necessary condition for building hierarchical relations among terms. If t1 is a hyponym of t2 in a domain term hierarchy, then the specificity of t1 is greater than that of t2. As domain specific terms are commonly compounds of the generic level term and some modifiers, inside information is important to represent the meaning of terms. Outside contextual information is also used to complement the shortcomings of inside information. We propose an information theoretic method to measure the quantity of terms. Experiments showed promising results with a precision of 73.9% when applied to terms in the MeSH thesaurus.