TermPicker: Enabling the Reuse of Vocabulary Terms by Exploiting Data from the Linked Open Data Cloud

Deciding which vocabulary terms to use when modeling data as Linked Open Data (LOD) is far from trivial. Choosing too general vocabulary terms, or terms from vocabularies that are not used by other LOD datasets, is likely to lead to a data representation, which will be harder to understand by humans and to be consumed by Linked data applications. In this technical report, we propose TermPicker: a novel approach for vocabulary reuse by recommending RDF types and properties based on exploiting the information on how other data providers on the LOD cloud use RDF types and properties to describe their data. To this end, we introduce the notion of so-called schema-level patterns (SLPs). They capture how sets of RDF types are connected via sets of properties within some data collection, e.g., within a dataset on the LOD cloud. TermPicker uses such SLPs and generates a ranked list of vocabulary terms for reuse. The lists of recommended terms are ordered by a ranking model which is computed using the machine learning approach Learning To Rank (L2R). TermPicker is evaluated based on the recommendation quality that is measured using the Mean Average Precision (MAP) and the Mean Reciprocal Rank at the first five positions (MRR@5). Our results illustrate an improvement of the recommendation quality by 29% - 36% when using SLPs compared to the beforehand investigated baselines of recommending solely popular vocabulary terms or terms from the same vocabulary. The overall best results are achieved using SLPs in conjunction with the Learning To Rank algorithm Random Forests.

[1]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[2]  Jürgen Umbrich,et al.  Observing Linked Data Dynamics , 2013, ESWC.

[3]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[4]  Nick Craswell Mean Reciprocal Rank , 2009, Encyclopedia of Database Systems.

[5]  Ansgar Scherp,et al.  Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling , 2014, ESWC.

[6]  Giovanni Tummarello,et al.  Introducing RDF Graph Summary with Application to Assisted SPARQL Formulation , 2012, 2012 23rd International Workshop on Database and Expert Systems Applications.

[7]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[8]  Raphaël Troncy,et al.  Enabling Linked Data Publication with the Datalift Platform , 2012, Semantic Cities @ AAAI.

[9]  Harith Alani,et al.  Searching ontologies based on content: experiments in the biomedical domain , 2007, K-CAP '07.

[10]  Hang Li,et al.  A Short Introduction to Learning to Rank , 2011, IEICE Trans. Inf. Syst..

[11]  Jens Lehmann,et al.  Triplify: light-weight linked data publication from relational databases , 2009, WWW '09.

[12]  Yuzhong Qu,et al.  Searching Linked Objects with Falcons: Approach, Implementation and Evaluation , 2009, Int. J. Semantic Web Inf. Syst..

[13]  Cristian R. Munteanu,et al.  An Approach for the Automatic Recommendation of Ontologies Using Collaborative Knowledge , 2010, KES.

[14]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[15]  Kristina Lerman,et al.  Semi-automatically Mapping Structured Sources into the Semantic Web , 2012, ESWC.

[16]  Boris Villazón-Terrazas,et al.  The ProtégéLOV Plugin: Ontology Access and Reuse for Everyone , 2015, ESWC.

[17]  Ghazi Rabihavi David , 1997 .

[18]  Jens Lehmann,et al.  LODStats - An Extensible Framework for High-Performance Dataset Analytics , 2012, EKAW.

[19]  Yuzhong Qu,et al.  An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems , 2011, International Semantic Web Conference.

[20]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[21]  Balázs Kégl,et al.  An apple-to-apple comparison of Learning-to-rank algorithms in terms of Normalized Discounted Cumulative Gain , 2012, ECAI 2012.

[22]  Li Ding,et al.  Characterizing the Semantic Web on the Web , 2006, SEMWEB.

[23]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[24]  Pablo Castells,et al.  CORE: A Tool for Collaborative Ontology Reuse and Evaluation , 2006, EON@WWW.

[25]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[26]  Enrico Motta,et al.  Watson: supporting next generation semantic web applications , 2007 .

[27]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[28]  Vojtech Svátek,et al.  Dataset Summary Visualization with LODSight , 2015, ESWC.

[29]  Tie-Yan Liu,et al.  Learning to rank for information retrieval , 2009, SIGIR.

[30]  Reza Entezari-Maleki,et al.  Comparison of Classification Methods Based on the Type of Attributes and Sample Size , 2009, J. Convergence Inf. Technol..

[31]  Craig A. Knoblock,et al.  Learning the Semantics of Structured Data Sources , 2016, J. Web Semant..

[32]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[33]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[34]  Craig A. Knoblock,et al.  Assigning Semantic Labels to Data Sources , 2015, ESWC.

[35]  Yuzhong Qu,et al.  Falcons: searching and browsing entities on the semantic web , 2008, WWW.

[36]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[37]  Thanh Tran Structure Index for RDF Data , 2010 .

[38]  María Poveda-Villalón,et al.  Linked Open Vocabularies (LOV): A gateway to reusable semantic vocabularies on the Web , 2016, Semantic Web.

[39]  Lora Aroyo,et al.  A knowledge pattern-based method for linked data analysis , 2011, K-CAP '11.

[40]  Yuzhong Qu,et al.  Falcons Concept Search: A Practical Search Engine for Web Ontologies , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[41]  Steffen Stadtmüller,et al.  Accessing Information About Linked Data Vocabularies with vocab.cc , 2012, CSWS.

[42]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[43]  Isabelle Augenstein,et al.  Statistical Knowledge Patterns: Identifying Synonymous Relations in Large Linked Datasets , 2013, International Semantic Web Conference.

[44]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[45]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[46]  Antonio Maccioni,et al.  Publishing Official Classifications in Linked Open Data , 2014, SemStats@ISWC.

[47]  James Fan,et al.  Learning to rank for robust question answering , 2012, CIKM.

[48]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[49]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[50]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[51]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..