Matching Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings

Web tables constitute valuable sources of information for various applications, ranging from Web search to Knowledge Base (KB) augmentation. An underlying common requirement is to annotate the rows of Web tables with semantically rich descriptions of entities published in Web KBs. In this paper, we evaluate three unsupervised annotation methods: (a) a lookup-based method which relies on the minimal entity context provided in Web tables to discover correspondences to the KB, (b) a semantic embeddings method that exploits a vectorial representation of the rich entity context in a KB to identify the most relevant subset of entities in the Web table, and (c) an ontology matching method, which exploits schematic and instance information of entities available both in a KB and a Web table. Our experimental evaluation is conducted using two existing benchmark data sets in addition to a new large-scale benchmark created using Wikipedia tables. Our results show that: (1) our novel lookup-based method outperforms state-of-the-art lookup-based methods, (2) the semantic embeddings method outperforms lookup-based methods in one benchmark data set, and (3) the lack of a rich schema in Web tables can limit the ability of ontology matching tools in performing high-quality table annotation. As a result, we propose a hybrid method that significantly outperforms individual methods on all the benchmarks.

[1]  Juan-Zi Li,et al.  RiMOM-IM: A Novel Iterative Framework for Instance Matching , 2016, Journal of Computer Science and Technology.

[2]  Vasilis Efthymiou,et al.  Big data entity resolution: From highly to somehow similar entity descriptions in the Web , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[3]  Thanh Tran,et al.  SERIMI: Class-Based Matching for Instance Matching Across Heterogeneous Datasets , 2015, IEEE Transactions on Knowledge and Data Engineering.

[4]  Doug Downey,et al.  TabEL: Entity Linking in Web Tables , 2015, SEMWEB.

[5]  Bernardo Cuenca Grau,et al.  LogMap: Logic-Based and Scalable Ontology Matching , 2011, SEMWEB.

[6]  Haixun Wang,et al.  Understanding Tables on the Web , 2012, ER.

[7]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[8]  Paolo Merialdo,et al.  Knowledge Base Augmentation using Tabular Data , 2014, LDOW.

[9]  Dominique Ritze,et al.  Matching HTML Tables to DBpedia , 2015, WIMS.

[10]  Vasilis Efthymiou,et al.  Annotating web tables through ontology matching , 2016, OM@ISWC.

[11]  Wei Shen,et al.  LIEGE:: link entities in web lists with knowledge base , 2012, KDD.

[12]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[14]  Robert Isele,et al.  Learning Expressive Linkage Rules using Genetic Programming , 2012, Proc. VLDB Endow..

[15]  Xiaoyong Du,et al.  ITEM: Extract and Integrate Entities from Tabular Data to RDF Knowledge Base , 2011, APWeb.

[16]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[17]  Jayant Madhavan,et al.  Applying WebTables in Practice , 2015, CIDR.

[18]  Oktie Hassanzadeh,et al.  Understanding a large corpus of web tables through matching with knowledge bases: an empirical study , 2015, OM.

[19]  Vasilis Efthymiou,et al.  Entity resolution in the web of data , 2013, Entity Resolution in the Web of Data.

[20]  Irini Fundulaki,et al.  Instance matching benchmarks in the era of Linked Data , 2016, J. Web Semant..

[21]  Michael Granitzer,et al.  DoSeR - A Knowledge-Base-Agnostic Framework for Entity Disambiguation Using Semantic Embeddings , 2016, ESWC.

[22]  Beng Chin Ooi,et al.  A hybrid machine-crowdsourcing system for matching web tables , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[23]  Octavian Udrea,et al.  Apples and oranges: a comparison of RDF benchmarks and real RDF datasets , 2011, SIGMOD '11.

[24]  Ziqi Zhang,et al.  Towards Efficient and Effective Semantic Table Interpretation , 2014, SEMWEB.

[25]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[26]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[27]  Gianluca Quercini,et al.  Entity discovery and annotation in tables , 2013, EDBT '13.

[28]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[29]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[30]  Dominique Ritze,et al.  Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases , 2016, WWW.

[31]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[32]  Timothy W. Finin,et al.  Using Linked Data to Interpret Tables , 2010, COLD.

[33]  William W. Cohen,et al.  WebSets: extracting sets of entities from the web using unsupervised information extraction , 2012, WSDM '12.

[34]  Surajit Chaudhuri,et al.  InfoGather: entity augmentation and attribute discovery by holistic matching with web tables , 2012, SIGMOD Conference.

[35]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.