Knowledge graph fusion for smart systems: A Survey

Abstract The emergence of various disruptive technologies such as big data, Internet of Things, and artificial intelligence have instigated our society to generate enormous volumes of data. The effective, efficient, and transparent capture and fusion of knowledge from a massive amount data is becoming an increasingly popular and crucial topic. In this study, we aim to provide a broad, complete, and systematic overview of the definitions and challenges of the knowledge graph fusion, which represents a holistic approach for integrating, enhancing, and unifying knowledge graphs. Further, advanced techniques for handling knowledge graph fusion along with the pragmatic smart systems leveraging it are discussed as a part of multiple perspectives. We believe that this survey study can be used as a potential reference for system practitioners and researchers in surpassing current obstacles as well as shaping their future direction.

[1]  James P. Callan,et al.  Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding , 2017, WWW.

[2]  Raphaël Troncy,et al.  Searching News Articles Using an Event Knowledge Graph Leveraged by Wikidata , 2019, WWW.

[3]  Jun Zhao,et al.  Knowledge Graph Completion with Adaptive Sparse Transfer Matrix , 2016, AAAI.

[4]  R. Harshman,et al.  PARAFAC: parallel factor analysis , 1994 .

[5]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[6]  Jimmy J. Lin,et al.  Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks , 2017, NAACL.

[7]  Geoffrey Zweig,et al.  Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding , 2014, INTERSPEECH.

[8]  Nicholas Jing Yuan,et al.  Collaborative Knowledge Base Embedding for Recommender Systems , 2016, KDD.

[9]  R. Doyle The American terrorist. , 2001, Scientific American.

[10]  Randall F. Trzeciak,et al.  Common Sense Guide to Prevention and Detection of Insider Threats , 2006 .

[11]  Percy Liang,et al.  Lambda Dependency-Based Compositional Semantics , 2013, ArXiv.

[12]  Wei Xu,et al.  CFO: Conditional Focused Neural Question Answering with Large-scale Knowledge Bases , 2016, ACL.

[13]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[14]  Johan Bollen,et al.  Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[15]  Michael Gamon,et al.  Representing Text for Joint Embedding of Text and Knowledge Bases , 2015, EMNLP.

[16]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[17]  Edsger W. Dijkstra,et al.  Predicate Calculus and Program Semantics , 1989, Texts and Monographs in Computer Science.

[18]  Siu Cheung Hui,et al.  Random Semantic Tensor Ensemble for Scalable Knowledge Graph Link Prediction , 2017, WSDM.

[19]  Michael Gegick,et al.  Matching attack patterns to security vulnerabilities in software-intensive system designs , 2005, SESS@ICSE.

[20]  Achim Rettinger,et al.  Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO , 2017, Semantic Web.

[21]  Hoang Long Nguyen,et al.  Social event decomposition for constructing knowledge graph , 2019, Future Gener. Comput. Syst..

[22]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[23]  Ankur Padia,et al.  UCO: A Unified Cybersecurity Ontology , 2016, AAAI Workshop: Artificial Intelligence for Cyber Security.

[24]  Xu Chen,et al.  Learning over Knowledge-Base Embeddings for Recommendation , 2018, Algorithms.

[25]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[26]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.

[27]  Hanh Huu Hoang,et al.  An Ontological Framework for Context-Aware Collaborative Business Process Formulation , 2014, Comput. Informatics.

[28]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[29]  Jeff Z. Pan,et al.  Content Based Fake News Detection Using Knowledge Graphs , 2018, SEMWEB.

[30]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[31]  Xiaodong He,et al.  Character-Level Question Answering with Attention , 2016, EMNLP.

[32]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[33]  Ming Du,et al.  PRTIRG: A Knowledge Graph for People-Readable Threat Intelligence Recommendation , 2019, KSEM.

[34]  Dik Lun Lee,et al.  Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba , 2018, KDD.

[35]  Li Sun,et al.  Graph Based Framework for Malicious Insider Threat Detection , 2018, HICSS.

[36]  Luis Gravano,et al.  Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.

[37]  Michael D. Iannacone,et al.  Developing an Ontology for Cyber Security Knowledge Graphs , 2015, CISR.

[38]  Claudia Niederée,et al.  A Blocking Framework for Entity Resolution in Highly Heterogeneous Information Spaces , 2013, IEEE Transactions on Knowledge and Data Engineering.

[39]  Pradeep Ravikumar,et al.  Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..

[40]  Ma Ning,et al.  A Plan Recognition Algorithm Based on Plan Knowledge Graph , 2002 .

[41]  D. Krathwohl A Revision of Bloom's Taxonomy: An Overview , 2002 .

[42]  William W. Cohen,et al.  Learning to match and cluster large high-dimensional data sets for data integration , 2002, KDD.

[43]  Daniel P. Miranker,et al.  An Unsupervised Algorithm for Learning Blocking Schemes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[44]  Tom M. Mitchell,et al.  Random Walk Inference and Learning in A Large Scale Knowledge Base , 2011, EMNLP.

[45]  Claudia Niederée,et al.  Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data , 2012, WSDM '12.

[46]  Yixin Cao,et al.  Explainable Reasoning over Knowledge Graphs for Recommendation , 2018, AAAI.

[47]  Steffen Staab,et al.  TripleRank: Ranking Semantic Web Data by Tensor Decomposition , 2009, SEMWEB.

[48]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[49]  Wolfram Wöß,et al.  An analysis of ontologies and their success factors for application to business , 2016, Data Knowl. Eng..

[50]  Markus Stumptner,et al.  Representing network knowledge using provenance-aware formalisms for cyber-situational awareness , 2018, KES.

[51]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[52]  Haofen Wang,et al.  Building and Exploring an Enterprise Knowledge Graph for Investment Analysis , 2016, SEMWEB.

[53]  Meng Wang,et al.  Safe Medicine Recommendation via Medical Knowledge Graph Embedding , 2017, ArXiv.

[54]  Ling Zhu,et al.  Knowledge graph for TCM health preservation: Design, construction, and applications , 2017, Artif. Intell. Medicine.

[55]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[56]  Dijiang Huang,et al.  Knowledge Graph based Learning Guidance for Cybersecurity Hands-on Labs , 2019, CompEd.

[57]  Hans-Peter Kriegel,et al.  Infinite Hidden Relational Models , 2006, UAI.

[58]  Pasquale Minervini,et al.  Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[59]  Volker Tresp,et al.  Tensor Factorization for Multi-relational Learning , 2013, ECML/PKDD.

[60]  Pasquale Lops,et al.  An investigation on the serendipity problem in recommender systems , 2015, Inf. Process. Manag..

[61]  Nicolas Usunier,et al.  Canonical Tensor Decomposition for Knowledge Base Completion , 2018, ICML.

[62]  Wolfram Wöß,et al.  Towards a Definition of Knowledge Graphs , 2016, SEMANTiCS.

[63]  B. Panda,et al.  A Knowledge-Base Model for Insider Threat Prediction , 2007, 2007 IEEE SMC Information Assurance and Security Workshop.

[64]  Xavier Serra,et al.  A Semantic Hybrid Approach for Sound Recommendation , 2015, WWW.

[65]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[66]  Yannis Stavrakas,et al.  Tweet and followee personalized recommendations based on knowledge graphs , 2018, J. Ambient Intell. Humaniz. Comput..

[67]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[68]  Chengkai Li,et al.  Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster , 2017, KDD.

[69]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[70]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[71]  Amos Azaria,et al.  An Entity Graph Based Recommender System , 2017, RecSys Posters.

[72]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[73]  Paolo Rosso,et al.  A systematic study of knowledge graph analysis for cross-language plagiarism detection , 2016, Inf. Process. Manag..

[74]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[75]  Gjergji Kasneci,et al.  Automated feature generation from structured knowledge , 2011, CIKM '11.

[76]  Emmanuel Müller,et al.  Notable Characteristics Search through Knowledge Graphs , 2018, EDBT.

[77]  Gerhard Weikum,et al.  YAGO: A Multilingual Knowledge Base from Wikipedia, Wordnet, and Geonames , 2016, SEMWEB.

[78]  Chun Lu,et al.  Travel Attractions Recommendation with Knowledge Graphs , 2016, EKAW.

[79]  Gerhard Weikum,et al.  KnowLife: A knowledge graph for health and life sciences , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[80]  Lise Getoor,et al.  Probabilistic Similarity Logic , 2010, UAI.

[81]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[82]  Liang Chang,et al.  Travel Attractions Recommendation with Travel Spatial-Temporal Knowledge Graphs , 2018, ICPCSEE.

[83]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[84]  Dejing Dou,et al.  Learning to Refine an Automatically Extracted Knowledge Base Using Markov Logic , 2012, 2012 IEEE 12th International Conference on Data Mining.

[85]  Lars Schmidt-Thieme,et al.  Pairwise interaction tensor factorization for personalized tag recommendation , 2010, WSDM '10.

[86]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[87]  Sri Nurdiati,et al.  25 years development of knowledge graph theory: the results and the challenge , 2008 .

[88]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[89]  Rui Song,et al.  Knowledge Graph in Smart Education: A Case Study of Entrepreneurship Scientific Publication Management , 2018 .

[90]  Hoang Long Nguyen,et al.  Utilizing Dynamics Patterns of Trust for Recommendation System , 2017, 2017 International Conference on Intelligent Environments (IE).

[91]  Paolo Tomeo,et al.  Schema-summarization in linked-data-based feature selection for recommender systems , 2017, SAC.

[92]  Paolo Tomeo,et al.  SPrank: Semantic Path-Based Ranking for Top-N Recommendations Using Linked Open Data , 2016, ACM Trans. Intell. Syst. Technol..

[93]  Peter Fankhauser,et al.  Efficient entity resolution for large heterogeneous information spaces , 2011, WSDM '11.

[94]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[95]  Minyi Guo,et al.  DKN: Deep Knowledge-Aware Network for News Recommendation , 2018, WWW.

[96]  Daniel P. Miranker,et al.  A two-step blocking scheme learner for scalable link discovery , 2014, OM.

[97]  Heiko Paulheim,et al.  RDF Graph Embeddings for Content-based Recommender Systems , 2016, CBRecSys@RecSys.

[98]  Lise Getoor,et al.  Knowledge Graph Identification , 2013, SEMWEB.

[99]  Minyi Guo,et al.  RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems , 2018, CIKM.

[100]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[101]  Viktor de Boer,et al.  The knowledge graph as the default data model for learning on heterogeneous knowledge , 2017, Data Sci..

[102]  Yan Jia,et al.  A Practical Approach to Constructing a Knowledge Graph for Cybersecurity , 2018 .

[103]  Salvatore J. Stolfo,et al.  Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.

[104]  Wei Du,et al.  Combining Enterprise Knowledge Graph and News Sentiment Analysis for Stock Price Prediction , 2019, HICSS.

[105]  Lars Schmidt-Thieme,et al.  Predicting RDF triples in incomplete knowledge bases with tensor factorization , 2012, SAC '12.

[106]  Beijun Shen,et al.  Enhancing Semantic Search of Crowdsourcing IT Services using Knowledge Graph , 2019, SEKE.

[107]  William W. Cohen,et al.  Character-level Analysis of Semi-Structured Documents for Set Expansion , 2009, EMNLP.

[108]  Lora Aroyo,et al.  On-the-fly Data Integration for Personalized Television Recommender Systems , 2008, 2008 Eighth International Conference on Web Engineering.

[109]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[110]  Jason J. Jung Ontology mapping composition for query transformation on distributed environments , 2010, Expert Syst. Appl..

[111]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[112]  Jun Zhao,et al.  Knowledge Graph Embedding via Dynamic Mapping Matrix , 2015, ACL.

[113]  Jens Lehmann,et al.  What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content , 2007, ESWC.

[114]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[115]  Michele Banko,et al.  Improving Knowledge Base Construction from Robust Infobox Extraction , 2019, NAACL-HLT.

[116]  Roi Blanco,et al.  Entity Recommendations in Web Search , 2013, SEMWEB.

[117]  Heiko Paulheim,et al.  Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[118]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[119]  Hans-Peter Kriegel,et al.  Factorizing YAGO: scalable machine learning for linked data , 2012, WWW.

[120]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[121]  D. S. Wang A Domain-Specific Question Answering System Based on Ontology and Question Templates , 2010, 2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[122]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[123]  Hans-Peter Kriegel,et al.  A Three-Way Model for Collective Learning on Multi-Relational Data , 2011, ICML.

[124]  Denny Vrandecic,et al.  Wikidata: a new platform for collaborative data collection , 2012, WWW.

[125]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[126]  Wei Zhang,et al.  From Data Fusion to Knowledge Fusion , 2014, Proc. VLDB Endow..

[127]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[128]  Alessandro Bozzon,et al.  Recurrent knowledge graph embedding for effective recommendation , 2018, RecSys.

[129]  Ferhan Türe,et al.  No Need to Pay Attention: Simple Recurrent Neural Networks Work! , 2016, EMNLP.

[130]  Raphaël Troncy,et al.  entity2rec: Learning User-Item Relatedness from Knowledge Graphs for Top-N Item Recommendation , 2017, RecSys.

[131]  Seyed Mehran Kazemi,et al.  SimplE Embedding for Link Prediction in Knowledge Graphs , 2018, NeurIPS.

[132]  Hoang Long Nguyen,et al.  Privacy-Aware Framework for Matching Online Social Identities in Multiple Social Networking Services , 2015, Cybern. Syst..

[133]  William W. Cohen,et al.  Personalized Recommendations using Knowledge Graphs: A Probabilistic Logic Programming Approach , 2016, RecSys.

[134]  Dennis Shasha,et al.  Efficient data reconciliation , 2001, Inf. Sci..

[135]  Zhen Wang,et al.  Knowledge Graph Embedding by Translating on Hyperplanes , 2014, AAAI.

[136]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[137]  Peter Christen,et al.  Unsupervised Blocking Key Selection for Real-Time Entity Resolution , 2015, PAKDD.

[138]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[139]  Meghana Nagori,et al.  Constructing Knowledge Graph by Extracting Correlations from Wikipedia Corpus for Optimizing Web Information Retrieval , 2018, 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT).

[140]  Gerhard Weikum,et al.  Towards a universal wordnet by learning from combined evidence , 2009, CIKM.

[141]  Tommaso Di Noia,et al.  Auto-Encoding User Ratings via Knowledge Graphs in Recommendation Scenarios , 2017, DLRS@RecSys.

[142]  Cyril Onwubiko,et al.  CoCoa: An Ontology for Cybersecurity Operations Centre Analysis Process , 2018, 2018 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA).

[143]  Adam Doupé,et al.  Towards Automated Threat Intelligence Fusion , 2016, 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC).

[144]  Nicolas Le Roux,et al.  A latent factor model for highly multi-relational data , 2012, NIPS.

[145]  Jason J. Jung Reusing ontology mappings for query routing in semantic peer-to-peer environment , 2010, Inf. Sci..

[146]  Ping Zhang,et al.  Large-scale structural and textual similarity-based mining of knowledge graph to predict drug-drug interactions , 2017, J. Web Semant..

[147]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[148]  Anupam Joshi,et al.  RelExt: Relation Extraction using Deep Learning approaches for Cybersecurity Knowledge Graph Improvement , 2019, 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[149]  Elena García Barriocanal,et al.  On integrating learning object metadata inside the OpenCyc knowledge base , 2004, IEEE International Conference on Advanced Learning Technologies, 2004. Proceedings..

[150]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.