ATC-NLSP: Prediction of the Classes of Anatomical Therapeutic Chemicals Using a Network-Based Label Space Partition Method

Anatomical Therapeutic Chemical (ATC) classification system proposed by the World Health Organization is a widely accepted drug classification scheme in both academic and industrial realm. It is a multilabeling system which categorizes drugs into multiple classes according to their therapeutic, pharmacological, and chemical attributes. In this study, we adopted a data-driven network-based label space partition (NLSP) method for prediction of ATC classes of a given compound within the multilabel learning framework. The proposed method ATC-NLSP is trained on the similarity-based features such as chemical–chemical interaction and structural and fingerprint similarities of a compound to other compounds belonging to the different ATC categories. The NLSP method trains predictors for each label cluster (possibly intersecting) detected by community detection algorithms and takes the ensemble labels for a compound as final prediction. Experimental evaluation based on the jackknife test on the benchmark dataset demonstrated that our method has boosted the absolute true rate, which is the most stringent evaluation metrics in this study, from 0.6330 to 0.7497, in comparison to the state-of-the-art approaches. Moreover, the community structures of the label relation graph were detected through the label propagation method. The advantage of multilabel learning over the single-label models was shown by label-wise analysis. Our study indicated that the proposed method ATC-NLSP, which adopts ideas from network research community and captures the correlation of labels in a data driven manner, is the top-performing model in the ATC prediction task. We believed that the power of NLSP remains to be unleashed for the multilabel learning tasks in drug discovery. The source codes are freely available at https://github.com/dqwei-lab/ATC.

[1]  Hao Lin,et al.  Predicting the Organelle Location of Noncoding RNAs Using Pseudo Nucleotide Compositions , 2017, Interdisciplinary Sciences: Computational Life Sciences.

[2]  Hiroshi Mamitsuka,et al.  NetGO: improving large-scale protein function prediction with massive network information , 2018, bioRxiv.

[3]  Wen Zhang,et al.  Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods , 2017, BMC Bioinformatics.

[4]  Loris Nanni,et al.  Convolutional Neural Networks for ATC Classification. , 2019, Current pharmaceutical design.

[5]  Yi Xiong,et al.  PseUI: Pseudouridine sites identification based on RNA sequence information , 2018, BMC Bioinformatics.

[6]  T. Welte,et al.  Efficacy Profiles of Daptomycin for Treatment of Invasive and Noninvasive Pulmonary Infections with Streptococcus pneumoniae , 2009, Antimicrobial Agents and Chemotherapy.

[7]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[8]  Xinhao Lin,et al.  Discovery of CDK4 inhibitors by convolutional neural networks. , 2019, Future medicinal chemistry.

[9]  Kuo-Chen Chou,et al.  iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals , 2017, Oncotarget.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Tomasz Kajdanowicz,et al.  A scikit-based Python environment for performing multi-label classification , 2017, ArXiv.

[12]  Ran Su,et al.  Exploring sequence‐based features for the improved prediction of DNA N4‐methylcytosine sites in multiple species , 2018, Bioinform..

[13]  Q. Zou,et al.  Gene2vec: gene subsequence embedding for prediction of mammalian N6-methyladenosine sites from mRNA , 2018, RNA.

[14]  Krzysztof J. Cios,et al.  Review of ensembles of multi-label classifiers: Models, experimental study and prospects , 2018, Inf. Fusion.

[15]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[16]  Dong-Sheng Cao,et al.  Predicting human intestinal absorption with modified random forest approach: a comprehensive evaluation of molecular representation, unbalanced data, and applicability domain issues , 2017 .

[17]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[18]  Quan Zou,et al.  ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm. , 2019, Journal of proteome research.

[19]  Daisuke Kihara,et al.  Survey of Machine Learning Techniques for Prediction of the Isoform Specificity of Cytochrome P450 Substrates. , 2019, Current drug metabolism.

[20]  Hanspeter Pfister,et al.  UpSet: Visualization of Intersecting Sets , 2014, IEEE Transactions on Visualization and Computer Graphics.

[21]  Yi Xiong,et al.  Protein-protein interface hot spots prediction based on a hybrid feature selection strategy , 2018, BMC Bioinformatics.

[22]  K. Potvin,et al.  Interprovincial Variation in Access to Publicly Funded Pharmaceuticals , 2004 .

[23]  Shi-Hua Zhang,et al.  DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank , 2016, Bioinform..

[24]  Kristian Kersting,et al.  How is a data-driven approach better than random choice in label space division for multi-label classification? , 2016, Entropy.

[25]  Yi Xiong,et al.  GOLabeler: Improving Sequence-based Large-scale Protein Function Prediction by Learning to Rank , 2017, bioRxiv.

[26]  Fengzhu Sun,et al.  NetGO: improving large-scale protein function prediction with massive network information , 2019, Nucleic acids research.

[27]  Sebastián Ventura,et al.  Multi‐label learning: a review of the state of the art and ongoing research , 2014, WIREs Data Mining Knowl. Discov..

[28]  Martin Hofmann-Apitius,et al.  Concept-Based Semi-Automatic Classification of Drugs , 2009, J. Chem. Inf. Model..

[29]  Wicher Bergsma,et al.  A bias-correction for Cramér’s and Tschuprow’s , 2013 .

[30]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[31]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[32]  K. Chou,et al.  iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. , 2013, Journal of theoretical biology.

[33]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[34]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[35]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[36]  Dong-Qing Wei,et al.  PredT4SE-Stack: Prediction of Bacterial Type IV Secreted Effectors From Protein Sequences Using a Stacked Ensemble Method , 2018, Front. Microbiol..

[37]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[38]  Johann Gasteiger,et al.  Comparison of Multilabel and Single-Label Classification Applied to the Prediction of the Isoform Specificity of Cytochrome P450 Substrates , 2009, J. Chem. Inf. Model..

[39]  Quan Zou,et al.  HPSLPred: An Ensemble Multi‐Label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source , 2017, Proteomics.

[40]  Stefan Günther,et al.  SuperPred: drug classification and target prediction , 2008, Nucleic Acids Res..

[41]  Shifan Ma,et al.  Prediction of Orthosteric and Allosteric Regulations on Cannabinoid Receptors Using Supervised Machine Learning Classifiers. , 2019, Molecular pharmaceutics.

[42]  Zixiang Wang,et al.  Computational identification of binding energy hot spots in protein–RNA complexes using an ensemble approach , 2018, Bioinform..

[43]  M. Kanehisa,et al.  The KEGG databases and tools facilitating omics analysis: latest developments involving human diseases and pharmaceuticals. , 2012, Methods in molecular biology.

[44]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[45]  Kuo-Chen Chou,et al.  iATC-mISF: a multi-label classifier for predicting the classes of anatomical therapeutic chemicals , 2017, Bioinform..

[46]  Bergsma,et al.  A bias-correction for Cramér ’ s V and Tschuprow ’ s T Wicher , 2012 .

[47]  John Morrissey,et al.  Data driven. , 2019, Hospitals & health networks.

[48]  Bruce W. Suter,et al.  The multilayer perceptron as an approximation to a Bayes optimal discriminant function , 1990, IEEE Trans. Neural Networks.

[49]  Yanqing Niu,et al.  Quantitative prediction of drug side effects based on drug-related features , 2017, Interdisciplinary Sciences: Computational Life Sciences.

[50]  F. Marra,et al.  Measurement of antibiotic consumption: A practical guide to the use of the Anatomical Thgerapeutic Chemical classification and Definied Daily Dose system methodology in Canada. , 2004, The Canadian journal of infectious diseases = Journal canadien des maladies infectieuses.

[51]  Ran Su,et al.  Iterative feature representations improve N4-methylcytosine site prediction , 2019, Bioinform..

[52]  K. Chou,et al.  Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities , 2012, PloS one.

[53]  Luhua Lai,et al.  Sequence-based prediction of protein protein interaction using a deep-learning algorithm , 2017, BMC Bioinformatics.

[54]  Jianfeng Pei,et al.  Prediction of Human Cytochrome P450 Inhibition Using a Multitask Deep Autoencoder Neural Network. , 2018, Molecular pharmaceutics.

[55]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[56]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[57]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[58]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[59]  D. Kihara,et al.  Survey of Machine Learning Techniques for Prediction of the Isoform Specificity of Cytochrome P450 Substrates. , 2019, Current drug metabolism.

[60]  Loris Nanni,et al.  Multi‐label classifier based on histogram of gradients for predicting the anatomical therapeutic chemical class/classes of a given compound , 2017, Bioinform..

[61]  Yi Xiong,et al.  PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm. , 2017, Journal of theoretical biology.

[62]  Feng Liu,et al.  Predicting drug side effects by multi-label learning and ensemble learning , 2015, BMC Bioinformatics.

[63]  Yufeng Liu,et al.  Relating Anatomical Therapeutic Indications by the Ensemble Similarity of Drug Sets , 2013, J. Chem. Inf. Model..

[64]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[65]  Feng Huang,et al.  SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions , 2018, PLoS Comput. Biol..