Computational prediction of ATC codes of drug-like compounds using tiered learning

The Anatomical Therapeutic Chemical (ATC) Code System is a World Health Organization (WHO) proposed classification that assigns codes to compounds based on their therapeutic, pharmacological and chemical characteristics as well as the in-vivo site of activity. The ability to predict the ATC code of an arbitrary compound with high accuracy can go a long way in selecting molecules for lead identification. We propose a computational approach to this problem that utilizes a natural pharmacological constraint, namely, that anatomical-therapeutic biological activity of certain types must preclude activities of many other types. The method proposed here utilizes machine learning in a tiered architecture; prediction of the ATC code at a certain level is constrained by the ATC code at the higher levels. Using this learning architecture, we have built classifiers that incorporate information from a compound's structure, as well as its chemical and protein interactions. The proposed approach has been validated using 2335 drugs from the ChEMBL database in both cross-validation and test setting. The prediction accuracy obtained with this approach is 78.72% and is comparable or better than the prediction accuracy of other methods at the state of the art.