Identifying Dominant Nodes in Semantic Taxonomies

In this paper, we propose the use of the “Dominance Metric”, a scalable and modular methodology for measuring the importance of nodes in semantic taxonomies, under the assumption that dominance does not only derive from high volume, namely the number of publications a tag is related to, but also from a mixture of structural and topological properties. We apply the methodology on the vast multidisciplinary Microsoft Academic Graph Fields of Study taxonomy in order to produce a refined and enhanced version. Finally, we describe the cleansing process of the resulting taxonomy, whereby its representation quality is increased by deduplicating and merging noisy concepts within the original structure. Based on the subsequent evaluation procedure, which provided valuable insights, the results of our proposed approach are quite promising.