Facilitating Data Discovery for Large-scale Science Facilities using Knowledge Networks

Large-scale multiuser scientific facilities, such as geographically distributed observatories, remote instruments, and experimental platforms, represent some of the largest national investments and can enable dramatic advances across many areas of science. Recent examples of such advances include the detection of gravitational waves and the imaging of a black hole’s event horizon. However, as the number of such facilities and their users grow, along with the complexity, diversity, and volumes of their data products, finding and accessing relevant data is becoming increasingly challenging, limiting the potential impact of facilities. These challenges are further amplified as scientists and application workflows increasingly try to integrate facilities’ data from diverse domains. In this paper, we leverage concepts underlying recommender systems, which are extremely effective in e-commerce, to address these data-discovery and data-access challenges for large-scale distributed scientific facilities. We first analyze data from facilities and identify and model user-query patterns in terms of facility location and spatial localities, domain-specific data models, and user associations. We then use this analysis to generate a knowledge graph and develop the collaborative knowledge-aware graph attention network (CKAT) recommendation model, which leverages graph neural networks (GNNs) to explicitly encode the collaborative signals through propagation and combine them with knowledge associations. Moreover, we integrate a knowledge-aware neural attention mechanism to enable the CKAT to pay more attention to key information while reducing irrelevant noise, thereby increasing the accuracy of the recommendations. We apply the proposed model on two real-world facility datasets and empirically demonstrate that the CKAT can effectively facilitate data discovery, significantly outperforming several compelling state-of-the-art baseline models.

[1]  Guangyan Lin,et al.  CKAN: Collaborative Knowledge-aware Attentive Network for Recommender Systems , 2020, SIGIR.

[2]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[3]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[4]  Chih-Wei L. Huang,et al.  First M87 Event Horizon Telescope Results. IV. Imaging the Central Supermassive Black Hole , 2019 .

[5]  Manish Parashar,et al.  Data Cyberinfrastructure for End-to-End Science , 2020, Computing in Science & Engineering.

[6]  D. P. Acharjya,et al.  An Information Retrieval and Recommendation System for Astronomical Observatories , 2017, 1710.05350.

[7]  Zheng Lin,et al.  Learning Entity and Relation Embeddings for Knowledge Resolution , 2017, ICCS.

[8]  Philip S. Yu,et al.  A Survey on Knowledge Graphs: Representation, Acquisition, and Applications , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Lars Schmidt-Thieme,et al.  Fast context-aware recommendations with factorization machines , 2011, SIGIR.

[10]  Manish Parashar,et al.  Towards a Smart, Internet-Scale Cache Service for Data Intensive Scientific Applications , 2019, ScienceCloud@HPDC.

[11]  Minyi Guo,et al.  Knowledge Graph Convolutional Networks for Recommender Systems , 2019, WWW.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[14]  Tat-Seng Chua,et al.  Neural Factorization Machines for Sparse Predictive Analytics , 2017, SIGIR.

[15]  Minyi Guo,et al.  RippleNet: Propagating User Preferences on the Knowledge Graph for Recommender Systems , 2018, CIKM.

[16]  Hyojin Kim,et al.  Nanomaterial Synthesis Insights from Machine Learning of Scientific Articles by Extracting, Structuring, and Visualizing Knowledge , 2020, J. Chem. Inf. Model..

[17]  Xiangnan He,et al.  MGAT: Multimodal Graph Attention Network for Recommendation , 2020, Inf. Process. Manag..

[18]  Xing Xie,et al.  A Survey on Knowledge Graph-Based Recommender Systems , 2020, IEEE Transactions on Knowledge and Data Engineering.

[19]  Kevin Fauvel,et al.  A Distributed Multi-Sensor Machine Learning Approach to Earthquake Early Warning , 2020, AAAI.

[20]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[21]  Xu Chen,et al.  Learning over Knowledge-Base Embeddings for Recommendation , 2018, Algorithms.

[22]  Nicholas Jing Yuan,et al.  Collaborative Knowledge Base Embedding for Recommender Systems , 2016, KDD.

[23]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[24]  John A. Barth,et al.  The Ocean Observatories Initiative , 2018, Front. Mar. Sci..

[25]  Anubhav Jain,et al.  Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature , 2019, J. Chem. Inf. Model..

[26]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[27]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[28]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[29]  Manish Parashar,et al.  Architecting the cyberinfrastructure for National Science Foundation Ocean Observatories Initiative (OOI) , 2016 .

[30]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[31]  Francisco M. Couto,et al.  Using Research Literature to Generate Datasets of Implicit Feedback for Recommending Scientific Items , 2019, IEEE Access.

[32]  The Ligo Scientific Collaboration,et al.  Observation of Gravitational Waves from a Binary Black Hole Merger , 2016, 1602.03837.

[33]  Minyi Guo,et al.  DKN: Deep Knowledge-Aware Network for News Recommendation , 2018, WWW.

[34]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[35]  Yixin Cao,et al.  KGAT: Knowledge Graph Attention Network for Recommendation , 2019, KDD.