Automated and Explainable Ontology Extension based on Deep Learning: A Case Study in the Chemical Domain

Reference ontologies provide a shared vocabulary and knowledge resource for their domain. Manual construction enables them to maintain a high quality, allowing them to be widely accepted across their community. However, the manual development process does not scale for large domains. We present a new methodology for automatic ontology extension and apply it to the ChEBI ontology, a prominent reference ontology for life sciences chemistry. We trained a Transformer-based deep learning model on the leaf node structures from the ChEBI ontology and the classes to which they belong. The model is then capable of automatically classifying previously unseen chemical structures. The proposed model achieved an overall F1 score of 0.80, an improvement of 6 percentage points over our previous results on the same dataset. Additionally, we demonstrate how visualizing the model’s attention weights can help to explain the results by providing insight into how the model made its decisions.

[1]  Timo Böhme,et al.  Automated compound classification using a chemical ontology , 2012, Journal of Cheminformatics.

[2]  Theo Tryfonas,et al.  Frontiers in Artificial Intelligence and Applications , 2009 .

[3]  Ian Horrocks,et al.  OWL-QL - a language for deductive query answering on the Semantic Web , 2004, J. Web Semant..

[4]  Graham Neubig,et al.  Learning to Deceive with Attention-Based Explanations , 2020, ACL.

[5]  Christophe Dessimoz,et al.  The Gene Ontology Handbook , 2017, Methods in Molecular Biology.

[6]  Lin Zhang,et al.  The research of concept extraction in ontology extension based on extended association rules , 2016, 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS).

[7]  Evan Bolton,et al.  ClassyFire: automated chemical classification with a comprehensive, computable taxonomy , 2016, Journal of Cheminformatics.

[8]  Bharath Ramsundar,et al.  ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction , 2020, ArXiv.

[9]  Jesse Vig,et al.  A Multiscale Visualization of Attention in the Transformer Model , 2019, ACL.

[10]  Waqar Mahmood,et al.  A survey of ontology learning techniques and applications , 2018, Database J. Biol. Databases Curation.

[11]  Constantine Bekas,et al.  “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models† †Electronic supplementary information (ESI) available: Time-split test set and example predictions, together with attention weights, confidence and token probabilities. See DO , 2017, Chemical science.

[12]  Ana Ozaki,et al.  Learning Description Logic Ontologies: Five Approaches. Where Do They Stand? , 2020, KI - Künstliche Intelligenz.

[13]  Robert Stevens,et al.  Structure-based classification and ontology in chemistry , 2012, Journal of Cheminformatics.

[14]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[15]  Anoop Sarkar,et al.  Interrogating the Explanatory Power of Attention in Neural Machine Translation , 2019, EMNLP.

[16]  Adam C Mater,et al.  Deep Learning in Chemistry , 2019, J. Chem. Inf. Model..

[17]  Elizabeth Chang,et al.  Semi-Automatic Ontology Extension Using Spreading Activation , 2005 .

[18]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[19]  Nacéra Bennacer,et al.  Contextual Concept Discovery Algorithm , 2007, FLAIRS Conference.

[20]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[21]  T. Mossakowski,et al.  Learning chemistry: exploring the suitability of machine learning for the task of structure-based chemical ontology classification , 2020, Journal of Cheminformatics.

[22]  Christian Biemann,et al.  Ontology Learning from Text: A Survey of Methods , 2005, LDV Forum.

[23]  R. Armiento,et al.  A Method for Extending Ontologies with Application to the Materials Science Domain , 2019, Data Sci. J..

[24]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[25]  Paul Buitelaar,et al.  RelExt: A Tool for Relation Extraction from Text in Ontology Extension , 2005, SEMWEB.

[26]  Christoph Steinbeck,et al.  ChEBI in 2016: Improved services and an expanding collection of metabolites , 2015, Nucleic Acids Res..

[27]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[28]  P. Garbacz An Analysis of the Debate over Structural Universals , 2020, Formal Ontology in Information Systems.

[29]  Till Mossakowski,et al.  Introducing the Open Energy Ontology: Enhancing Data Interpretation and Interfacing in Energy Systems Analysis , 2021 .

[30]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[31]  Lav R. Varshney,et al.  BERTology Meets Biology: Interpreting Attention in Protein Language Models , 2020, bioRxiv.

[32]  Abhinav Vishnu,et al.  SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties , 2017, ArXiv.

[33]  William L. Jorgensen,et al.  Journal of Chemical Information and Modeling , 2005, J. Chem. Inf. Model..

[34]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[35]  Robert Hoehndorf,et al.  Combining lexical and context features for automatic ontology extension , 2020, J. Biomed. Semant..

[36]  Franz Baader,et al.  Formalizing biomedical concepts from textual definitions , 2015, J. Biomed. Semant..

[37]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[38]  Diego Calvanese,et al.  Ontology-Based Data Access: A Survey , 2018, IJCAI.

[39]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[40]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[41]  Estevam R. Hruschka,et al.  Never-ending ontology extension through machine reading , 2014, 2014 14th International Conference on Hybrid Intelligent Systems.