Making Sense of Numerical Data - Semantic Labelling of Web Tables

With the increasing amount of structured data on the web the need to understand and support search over this emerging data space is growing. Adding semantics to structured data can help address existing challenges in data discovery, as it facilitates understanding the values in their context. While there are approaches on how to lift structured data to semantic web formats to enrich it and facilitate discovery, most work to date focuses on textual fields rather than numerical data. In this paper, we propose a two level (row and column based) approach to add semantic meaning to numerical values in tables, called NUMER. We evaluate our approach using a benchmark (NumDB) generated for the purpose of this work. We show the influence of the different levels of analysis on the success of assigning semantic labels to numerical values in tables. Our approach outperforms the state of the art and is less affected by data structure and quality issues such as a small number of entities or deviations in the data.

[1]  Timothy W. Finin,et al.  Using Linked Data to Interpret Tables , 2010, COLD.

[2]  Tim Finin,et al.  Exploiting a Web of Semantic Data for Interpreting Tables , 2010 .

[3]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[4]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[5]  Haixun Wang,et al.  Understanding Tables on the Web , 2012, ER.

[6]  Exploiting Structure within Data for Accurate Labeling using Conditional Random Fields , 2012 .

[7]  Kristina Lerman,et al.  Semi-automatically Mapping Structured Sources into the Semantic Web , 2012, ESWC.

[8]  Sören Auer,et al.  User-driven semantic mapping of tabular data , 2013, I-SEMANTICS '13.

[9]  Hanan Samet,et al.  Schema Extraction for Tabular Data on the Web , 2013, Proc. VLDB Endow..

[10]  Craig A. Knoblock,et al.  A Scalable Approach to Learn Semantic Models of Structured Sources , 2014, 2014 IEEE International Conference on Semantic Computing.

[11]  Heiko Paulheim,et al.  Detecting Incorrect Numerical Data in DBpedia , 2014, ESWC.

[12]  Craig A. Knoblock,et al.  Assigning Semantic Labels to Data Sources , 2015, ESWC.

[13]  Doug Downey,et al.  TabEL: Entity Linking in Web Tables , 2015, SEMWEB.

[14]  Dominique Ritze,et al.  Matching HTML Tables to DBpedia , 2015, WIMS.

[15]  Tom Heath,et al.  Position Paper: Dataset profling for un-Linked Data , 2016, PROFILES@ESWC.

[16]  Dominique Ritze,et al.  Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases , 2016, WWW.

[17]  Jürgen Umbrich,et al.  Multi-level Semantic Labelling of Numerical Values , 2016, SEMWEB.

[18]  Axel-Cyrille Ngonga Ngomo,et al.  TAIPAN: Automatic Property Mapping for Tabular Data , 2016, EKAW.

[19]  Jürgen Umbrich,et al.  Characteristics of Open Data CSV Files , 2016, 2016 2nd International Conference on Open and Big Data (OBD).

[20]  Craig A. Knoblock,et al.  Semantic Labeling: A Domain-Independent Approach , 2016, SEMWEB.

[21]  Elena Paslaru Bontas Simperl,et al.  The Trials and Tribulations of Working with Structured Data: -a Study on Information Seeking Behaviour , 2017, CHI.

[22]  Vasilis Efthymiou,et al.  Matching Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings , 2017, SEMWEB.