TAIPAN: Automatic Property Mapping for Tabular Data

The Web encompasses a significant amount of knowledge hidden in entity-attributes tables. Bridging the gap between these tables and the Web of Data thus has the potential to facilitate a large number of applications, including the augmentation of knowledge bases from tables, the search for related tables and the completion of tables using knowledge bases. Computing such bridges is impeded by the poor accuracy of automatic property mapping, the lack of approaches for the discovery of subject columns and the mere size of table corpora. We propose Taipan, a novel approach for recovering the semantics of tables. Our approach begins by identifying subject columns using a combination of structural and semantic features. It then maps binary relations inside a table to predicates from a given knowledge base. Therewith, our solution supports both the tasks of table expansion and knowledge base augmentation. We evaluate our approach on a table dataset generated from real RDF data and a manually curated version of the T2D gold standard. Our results suggest that we outperform the state of the art by upi¾?to 85i¾?% F-measure.

[1]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[2]  Raphaël Troncy,et al.  GERBIL: General Entity Annotator Benchmarking Framework , 2015, WWW.

[3]  Dominique Ritze,et al.  Matching HTML Tables to DBpedia , 2015, WIMS.

[4]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[5]  Haixun Wang,et al.  Understanding Tables on the Web , 2012, ER.

[6]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[7]  Timothy W. Finin,et al.  Using Linked Data to Interpret Tables , 2010, COLD.

[8]  Kristina Lerman,et al.  Semi-automatically Mapping Structured Sources into the Semantic Web , 2012, ESWC.

[9]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[10]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[11]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[12]  Dominique Ritze,et al.  Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases , 2016, WWW.

[13]  Yeye He,et al.  Concept Expansion Using Web Tables , 2015, WWW.

[14]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[15]  Sören Auer,et al.  AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data , 2014, International Semantic Web Conference.

[16]  Gerhard Weikum,et al.  PATTY: A Taxonomy of Relational Patterns with Semantic Types , 2012, EMNLP.

[17]  Heiko Paulheim,et al.  The Mannheim Search Join Engine , 2015, J. Web Semant..

[18]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[19]  Sören Auer,et al.  User-driven semantic mapping of tabular data , 2013, I-SEMANTICS '13.

[20]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[21]  Jayant Madhavan,et al.  Applying WebTables in Practice , 2015, CIDR.

[22]  Axel-Cyrille Ngonga Ngomo,et al.  Ensemble Learning for Named Entity Recognition , 2014, SEMWEB.

[23]  Axel-Cyrille Ngonga Ngomo,et al.  Extracting Multilingual Natural-Language Patterns for RDF Predicates , 2012, EKAW.

[24]  Ziqi Zhang,et al.  Towards Efficient and Effective Semantic Table Interpretation , 2014, SEMWEB.