Assigning Semantic Labels to Data Sources

There is a huge demand to be able to find and integrate heterogeneous data sources, which requires mapping the attributes of a source to the concepts and relationships defined in a domain ontology. In this paper, we present a new approach to find these mappings, which we call semantic labeling. Previous approaches map each data value individually, typically by learning a model based on features extracted from the data using supervised machine-learning techniques. Our approach differs from existing approaches in that we take a holistic view of the data values corresponding to a semantic label and use techniques that treat this data collectively, which makes it possible to capture characteristic properties of the values associated with a semantic label as a whole. Our approach supports both textual and numeric data and proposes the top $$k$$ semantic labels along with their associated confidence scores. Our experiments show that the approach has higher label prediction accuracy, has lower time complexity, and is more scalable than existing systems.

[1]  Michael Stonebraker,et al.  Data Curation at Scale: The Data Tamer System , 2013, CIDR.

[2]  Tim Finin,et al.  Exploiting a Web of Semantic Data for Interpreting Tables , 2010 .

[3]  Daisy Zhe Wang,et al.  WebTables: exploring the power of tables on the web , 2008, Proc. VLDB Endow..

[4]  G. Casella,et al.  Springer Texts in Statistics , 2016 .

[5]  Timothy W. Finin,et al.  Semantic Message Passing for Generating Linked Data from Tables , 1999, SEMWEB.

[6]  Pedro M. Domingos,et al.  Learning to Match the Schemas of Data Sources: A Multistrategy Approach , 2003, Machine Learning.

[7]  Craig A. Knoblock,et al.  A Scalable Approach to Learn Semantic Models of Structured Sources , 2014, 2014 IEEE International Conference on Semantic Computing.

[8]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[9]  Exploiting Structure within Data for Accurate Labeling using Conditional Random Fields , 2012 .

[10]  Kristina Lerman,et al.  Automatically Constructing Semantic Web Services from Online Sources , 2009, SEMWEB.

[11]  Natalya F. Noy,et al.  Semantic integration: a survey of ontology-based approaches , 2004, SGMD.

[12]  Nick Craswell Mean Reciprocal Rank , 2009, Encyclopedia of Database Systems.

[13]  Chris Clifton,et al.  Semantic Integration in Heterogeneous Databases Using Neural Networks , 1994, VLDB.

[14]  Daniel P. Miranker,et al.  On directly mapping relational databases to RDF and OWL , 2012, WWW.

[15]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[16]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .