A Random Projection Ensemble Approach to Drug-Target Interaction Prediction

Drug-target interaction prediction is very important in drug development. Since determining drug-target interactions is costly and time-consuming by experiments, it is a complement to determine the interactions by computational method. To address the issue, a random projection ensemble approach is proposed and drug-compounds are encoded with feature descriptors by software “PaDEL-Descriptor”, while target proteins are encoded with physicochemical properties of amino acids. From 544 properties in AAindex1, 34 relatively independent physicochemical properties are extracted. Random projection on the vector of drug-target pair with different dimensions can map the original space onto a reduced one and thus yield a transformed vector with fixed dimension. Several random projections build an ensemble REPTree system. Experimental results showed that our method significantly outperformed and ran faster than other state-of-the-art drug-target predictors.

[1]  Lior Rokach,et al.  Random Projection Ensemble Classifiers , 2009, ICEIS.

[2]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[3]  Hiroshi Mamitsuka,et al.  A probabilistic model for mining implicit 'chemical compound-gene' relations from literature , 2005, ECCB/JBI.

[4]  Donato Malerba,et al.  The effects of pruning methods on the predictive accuracy of induced decision trees , 1999 .

[5]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[6]  Johnson,et al.  Predicting human safety: screening and computational approaches. , 2000, Drug discovery today.

[7]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[8]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[9]  Peng Chen,et al.  Predicting protein interaction sites from residue spatial sequence profile and evolution rate , 2006, FEBS Letters.

[10]  Jinyan Li,et al.  Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences , 2013, Proteins.

[11]  Jinyan Li,et al.  Detection of Outlier Residues for Improving Interface Prediction in Protein Hetero-complexes , 2022 .

[12]  Jinyan Li,et al.  Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information , 2010, BMC Bioinformatics.

[13]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[14]  Xin Gao,et al.  LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone , 2014, BMC Bioinformatics.

[15]  Frederick P. Roth,et al.  Chemical substructures that enrich for biological activity , 2008, Bioinform..

[16]  Jonathan Knowles,et al.  A guide to drug discovery: Target selection in drug discovery , 2003, Nature Reviews Drug Discovery.

[17]  Daniel R. Caffrey,et al.  Structure-based maximal affinity model predicts small-molecule druggability , 2007, Nature Biotechnology.

[18]  Howard L McLeod,et al.  Pharmacogenomics--drug disposition, drug targets, and side effects. , 2003, The New England journal of medicine.

[19]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).