Extraction of Hierarchies Based on Inclusion of Co-occurring Words with Frequency Information

In this paper, we propose a method of automatically extracting word hierarchies based on the inclusion relations of word appearance patterns in corpora. We applied the complementary similarity measure (CSM) to determine a hierarchical structure of word meanings. The CSM is a similarity measure developed for recognizing degraded machine-printed text. There are CSMs for both binary and gray-scale images. The CSM for binary images has been applied to estimate one-to-many relations, such as superordinate-subordinate relations, and to extract word hierarchies. However, the CSM for gray-scale images has not been applied to natural language processing. Here, we apply the latter to extract word hierarchies from corpora. To do this, we used frequency information for co-occurring words, which is not considered when using the CSM for binary images. We compared our hierarchies with those obtained using the CSM for binary images, and evaluated them by measuring their degree of agreement with the EDR electronic dictionary.