Semantic labeling for quantitative data using Wikidata

Semantic labeling for numerical attributes is a process of matching numerical attributes in tabular resources to properties and classes in knowledge bases. It can be used in many applications such as table search, table extension, and knowledge augmentation. One of the challenges of this tasks is to distinguish numerical attributes expressed in various scales or units of measurement. Indeed, how to distinguish the similar attributes of “human height centimeters” and “human height feet” and the dissimilar attribute “population million”. Previous studies assume the similar attributes expressed in the same scale. In fact, the similar attributes could be expressed differently since the data resource is constructed by different people in different background and context. In this paper, we propose a novel method to improve the performance of semantic labeling for numerical attributes in various scales. We use an external knowledge about unit conversion taken from Wikidata to generate more data resources for the numerical background knowledge bases (WBKB). Our empirical experiments show that using the WBKB can improve the performance of semantic labeling expressed in various scales.