MTab4Wikidata at SemTab 2020: Tabular Data Annotation with Wikidata

This paper introduces an automatic semantic annotation system, namely MTab4Wikidata, for the three semantic annotation tasks, i.e., Cell-Entity Annotation (CEA), Column-Type Annotation (CTA), Column Relation-Property Annotation (CPA), of Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2020). In particular, we introduce (1) a novel fuzzy entity search to address misspelling table cells, (2) a fuzzy statement search to deal with ambiguous cells, (3) a statement enrichment module to address the Wikidata shifting issue, (4) an efficient and effective post-processing for the matching tasks. Our system achieves impressive empirical performance for the three annotation tasks and wins the first prize at SemTab 2020. MTab4Wikidata is ranked 1 in the two tasks of CEA and CPA, and 2 rank in the CTA task on the round 1, 2, 3 datasets and 1 rank on the round 4 dataset and the Tough Tables (2T) dataset.