论文信息 - Effective and efficient Semantic Table Interpretation using TableMiner+

Effective and efficient Semantic Table Interpretation using TableMiner+

This article introduces TableMiner+, a Semantic Table Interpretation method that annotates Web tables in a both effective and efficient way. Built on our previous work TableMiner, the extended version advances state-of-the-art in several ways. First, it improves annotation accuracy by making innovative use of various types of contextual information both inside and outside tables as features for inference. Second, it reduces computational overheads by adopting an incremental, bootstrapping approach that starts by creating preliminary and partial annotations of a table using ‘sample’ data in the table, then using the outcome as ‘seed’ to guide interpretation of remaining contents. This is then followed by a message passing process that iteratively refines results on the entire table to create the final optimal annotations. Third, it is able to handle all annotation tasks of Semantic Table Interpretation (e.g., annotating a column, or entity cells) while state-of-the-art methods are limited in different ways. We also compile the largest dataset known to date and extensively evaluate TableMiner+ against four baselines and two re-implemented (near-identical, as adaptations are needed due to the use of different knowledge bases) state-of-the-art methods. TableMiner+ consistently outperforms all models under all experimental settings. On the two most diverse datasets covering multiple domains and various table schemata, it achieves improvement in F1 by between 1 and 42 percentage points depending on specific annotation tasks. It also significantly reduces computational overheads in terms of wall-clock time when compared against classic methods that ‘exhaustively’ process the entire table content to build features for inference. As a concrete example, compared against a method based on joint inference implemented with parallel computation, the non-parallel implementation of TableMiner+ achieves significant improvement in learning accuracy and almost orders of magnitude of savings in wall-clock time.

Ziqi Zhang | Ziqi Zhang

[1] Tim Finin,et al. Exploiting a Web of Semantic Data for Interpreting Tables , 2010 .

[2] Daisy Zhe Wang,et al. WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[3] Ziqi Zhang,et al. A Novel Approach to Automatic Gazetteer Generation using Wikipedia , 2009, PWNLP@IJCNLP.

[4] Ollivier Haemmerlé,et al. Fuzzy Annotation of Web Data Tables Driven by a Domain Ontology , 2009, ESWC.

[5] Silviu Cucerzan,et al. Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[6] Kentaro Torisawa,et al. Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[7] Nicholas Kushmerick,et al. Wrapper Induction for Information Extraction , 1997, IJCAI.

[8] Jayant Madhavan,et al. Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[9] Philipp Koehn,et al. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[10] Alessandra Mileo,et al. Using linked data to mine RDF from wikipedia's tables , 2014, WSDM.

[11] Ollivier Haemmerlé,et al. An Ontology-Driven Annotation of Data Tables , 2007, WISE Workshops.