RDF shape induction using knowledge base profiling

Knowledge Graphs (KGs) are becoming the core of most artificial intelligent and cognitive applications. Popular KGs such as DBpedia and Wikidata have chosen the RDF data model to represent their data. Despite the advantages, there are challenges in using RDF data, for example, data validation. Ontologies for specifying domain conceptualizations in RDF data are designed for entailments rather than validation. Most ontologies lack the granular information needed for validating constraints. Recent work on RDF Shapes and standardization of languages such as SHACL and ShEX provide better mechanisms for representing integrity constraints for RDF data. However, manually creating constraints for large KGs is still a tedious task. In this paper, we present a data driven approach for inducing integrity constraints for RDF data using data profiling. Those constraints can be combined into RDF Shapes and can be used to validate RDF graphs. Our method is based on machine learning techniques to automatically generate RDF shapes using profiled RDF data as features. In the experiments, the proposed approach achieved 97% precision in deriving RDF Shapes with cardinality constraints for a subset of DBpedia data.

[1]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[2]  Giri Kumar Tayi,et al.  Examining data quality , 1998, CACM.

[3]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  Felix Naumann,et al.  Improving RDF Data Through Association Rule Mining , 2013, Datenbank-Spektrum.

[6]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[7]  Guido Moerkotte,et al.  Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[8]  Andreas Harth,et al.  Weaving the Pedantic Web , 2010, LDOW.

[9]  Isabelle Augenstein,et al.  Statistical Knowledge Patterns for Characterising Linked Data , 2013, WOP.

[10]  Isabelle Mirbel,et al.  DFS-based frequent graph pattern extraction to characterize the content of RDF Triple Stores , 2010 .

[11]  Z. Hasan A Survey on Shari’Ah Governance Practices in Malaysia, GCC Countries and the UK , 2011 .

[12]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.

[13]  David W. Embley,et al.  Cardinality Constraints in Semantic Data Models , 1993, Data Knowl. Eng..

[14]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[15]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[16]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[17]  Asunción Gómez-Pérez,et al.  Loupe - An Online Tool for Inspecting Datasets in the Linked Data Cloud , 2015, SEMWEB.

[18]  Felix Naumann,et al.  Data profiling , 2017, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[19]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[20]  Jens Lehmann,et al.  Inductive Lexical Learning of Class Expressions , 2014, EKAW.

[21]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[22]  Richard A. Olshen,et al.  CART: Classification and Regression Trees , 1984 .

[23]  Hassan Khosravi,et al.  A Survey on Statistical Relational Learning , 2010, Canadian Conference on AI.

[24]  Serge Abiteboul,et al.  Foundations of Databases: The Logical Level , 1995 .

[25]  Luc De Raedt,et al.  Constraint Programming for Data Mining and Machine Learning , 2010, AAAI.

[26]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[27]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[28]  Gerben de Vries,et al.  Machine Learning on Linked Data, a Position Paper , 2014, LD4KD.

[29]  Raphaël Troncy,et al.  3cixty: Building comprehensive knowledge bases for city exploration , 2017, J. Web Semant..

[30]  Jiao Tao,et al.  Extending OWL with Integrity Constraints , 2010, Description Logics.

[31]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[32]  Harold R. Solbrig,et al.  Shape expressions: an RDF validation and transformation language , 2014, SEM '14.