Machine-OlF-Action: a unified framework for developing and interpreting machine-learning models for chemosensory research

AVAILABILITY AND IMPLEMENTATION Machine Learning-based techniques are emerging as state-of-the-art methods in chemoinformatics to selectively, effectively, and speedily identify biologically-relevant molecules from large databases. So far, a multitude of such techniques have been proposed, but unfortunately due to their sparse availability, and the dependency on high-end computational literacy, their wider adaptation faces challenges, at least in the context of G-Protein Coupled Receptors (GPCRs)-associated chemosensory research. Here we report Machine-OlF-Action (MOA), a user-friendly, open-source computational framework, that utilizes user-supplied SMILES (simplified molecular-input line-entry system) of the chemicals, along with their activation status, to synthesize classification models. MOA integrates a number of popular chemical databases collectively harboring ∼103 million chemical moieties. MOA also facilitates customized screening of user-supplied chemical datasets. A key feature of MOA is its ability to embed molecules based on the similarity of their local neighborhood, by utilizing a state of the art model interpretability framework LIME. We demonstrate the utility of MOA in identifying previously unreported agonists for human and mouse olfactory receptors OR1A1 and MOR174-9 by leveraging the chemical features of their known agonists and non-agonists. In summary, here we develop an ML-powered software playground for performing supervisory learning tasks involving chemical compounds.

[1]  Tatsuya Takagi,et al.  Mordred: a molecular descriptor calculator , 2018, Journal of Cheminformatics.

[2]  Takamichi Nakamoto,et al.  Predictive modeling for odor character of a chemical using machine learning combined with natural language processing , 2018, PloS one.

[3]  Evan Bolton,et al.  An overview of the PubChem BioAssay resource , 2009, Nucleic Acids Res..

[4]  Machine learning decodes chemical features to identify novel agonists of a moth odorant receptor , 2020, Scientific Reports.

[5]  H. Matsunami,et al.  Agonists of G-Protein-Coupled Odorant Receptors Are Predicted from Chemical Features. , 2018, The journal of physical chemistry letters.

[6]  Debarka Sengupta,et al.  Analysis of single-cell transcriptomes links enrichment of olfactory receptors with cancer cell differentiation status and prognosis , 2020, Communications biology.

[7]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[8]  Ayana Dagan-Wiener,et al.  Bitter or not? BitterPredict, a tool for predicting taste from chemical structure , 2017, Scientific Reports.

[9]  Johan Karlsson,et al.  Applications of Deep-Learning in Exploiting Large-Scale and Heterogeneous Compound Data in Industrial Pharmaceutical Research , 2019, Front. Pharmacol..

[10]  P. Scarborough,et al.  Nutrient composition databases in the age of big data: foodDB, a comprehensive, real-time database infrastructure , 2019, BMJ Open.

[11]  Shoba Ranganathan,et al.  Applications of machine learning in GPCR bioactive ligand discovery. , 2019, Current opinion in structural biology.

[12]  Areejit Samal,et al.  IMPPAT: A curated database of Indian Medicinal Plants, Phytochemistry And Therapeutics , 2017, Scientific Reports.

[13]  Rok Blagus,et al.  SMOTE for high-dimensional class-imbalanced data , 2013, BMC Bioinformatics.

[14]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[15]  Ola Engkvist,et al.  Cheminformatics in Drug Discovery, an Industrial Perspective , 2018, Molecular informatics.

[16]  Shaoyong Lu,et al.  BitterX: a tool for understanding bitter taste in humans , 2016, Scientific Reports.

[17]  P. Kolb,et al.  Interrogating dense ligand chemical space with a forward-synthetic library , 2019, Proceedings of the National Academy of Sciences.

[18]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[19]  Jörn Lötsch,et al.  Machine Learning in Human Olfactory Research , 2018, Chemical senses.

[20]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[21]  David S. Wishart,et al.  HMDB 4.0: the human metabolome database for 2018 , 2017, Nucleic Acids Res..