Web Table Column Type Detection Using Deep Learning and Probability Graph Model

The rich knowledge contains on the web plays an important role in the researches and practical applications including web search, multi-question answering, and knowledge base construction. How to correctly detect the semantic types of all the data columns is critical to understand the web table. The traditional methods have the following limitations: (1) Most of them rely on dictionary lookup and regular expression matching, and are generally not robust to dirty data; (2) They only consider character data besides numeric data which accounts for a large proportion; (3) Some models take the characteristics of a single column and do not consider the special organizational structure of the table. In this paper, a column type detection method combining deep learning and probability graph model is proposed, taking the semantic features of a single column and the interaction between multiple columns into account to improve the prediction accuracy. Experimental results show that our method has higher accuracy compared with the state-of-the-art approaches.

[1]  Daisy Zhe Wang,et al.  Ten Years of WebTables , 2018, Proc. VLDB Endow..

[2]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[3]  Michael Granitzer,et al.  Towards Disambiguating Web Tables , 2013, SEMWEB.

[4]  Krisztian Balog,et al.  Web Table Extraction, Retrieval, and Augmentation , 2020, ACM Trans. Intell. Syst. Technol..

[5]  Ian Horrocks,et al.  ColNet: Embedding the Semantics of Web Tables for Column Type Prediction , 2018, AAAI.

[6]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[7]  Paolo Merialdo,et al.  Knowledge Base Augmentation using Tabular Data , 2014, LDOW.

[8]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[9]  Ziqi Zhang,et al.  Effective and efficient Semantic Table Interpretation using TableMiner+ , 2017, Semantic Web.

[10]  Sheng Yan,et al.  BiRNN-DKT: Transfer Bi-directional LSTM RNN for Knowledge Tracing , 2019, WISA.

[11]  Vasilis Efthymiou,et al.  Matching Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings , 2017, SEMWEB.

[12]  Craig A. Knoblock,et al.  Assigning Semantic Labels to Data Sources , 2015, ESWC.

[13]  Doug Downey,et al.  TabEL: Entity Linking in Web Tables , 2015, SEMWEB.

[14]  Dominique Ritze,et al.  Matching HTML Tables to DBpedia , 2015, WIMS.

[15]  Tim Kraska,et al.  Sherlock: A Deep Learning Approach to Semantic Data Type Detection , 2019, KDD.

[16]  Hao Ma,et al.  Table Cell Search for Question Answering , 2016, WWW.

[17]  Craig A. Knoblock,et al.  Semantic Labeling: A Domain-Independent Approach , 2016, SEMWEB.

[18]  Dominique Ritze,et al.  Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases , 2016, WWW.

[19]  Timothy W. Finin,et al.  Using Linked Data to Interpret Tables , 2010, COLD.