Properties and identification of human protein drug targets

MOTIVATION We analysed 148 human drug target proteins and 3573 non-drug targets to identify differences in their properties and to predict new potential drug targets. RESULTS Drug targets are rare in organelles; they are more likely to be enzymes, particularly oxidoreductases, transferases or lyases and not ligases; they are involved in binding, signalling and communication; they are secreted; and have long lifetimes, shown by lack of PEST signals and the presence of N-glycosylation. This can be summarized into eight key properties that are desirable in a human drug target, namely: high hydrophobicity, high length, SignalP motif present, no PEST motif, more than two N-glycosylated amino acids, not more than one O-glycosylated Ser, low pI and membrane location. The sequence features were used as inputs to a support vector machine (SVM), allowing the assignment of any sequence to the drug target or non-target classes with an accuracy in the training set of 96%. We identified 668 proteins (23%) in the non-target set that have target-like properties. We suggest that drug discovery programmes would be more likely to succeed if new targets are chosen from this set or their homologues.

[1]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[2]  T. Keller,et al.  A practical view of 'druggability'. , 2006, Current opinion in chemical biology.

[3]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[4]  Hsuan-Tien Lin,et al.  Improving Generalization by Data Categorization , 2005, PKDD.

[5]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[6]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[7]  S. Brunak,et al.  Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites. , 2005, Glycobiology.

[8]  B. Honig,et al.  On the nature of cavities on protein surfaces: Application to the identification of drug‐binding sites , 2006, Proteins.

[9]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[10]  Michael B. Yaffe,et al.  Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs , 2003, Nucleic Acids Res..

[11]  Søren Brunak,et al.  Prediction of human protein function according to Gene Ontology categories , 2003, Bioinform..

[12]  J. Peter-Katalinic,et al.  Methods in enzymology: O-glycosylation of proteins. , 2005, Methods in enzymology.

[13]  Paul Horton,et al.  PROTEIN SUBCELLULAR LOCALIZATION PREDICTION WITH WOLF PSORT , 2005 .

[14]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[15]  Daniel R. Caffrey,et al.  Structure-based maximal affinity model predicts small-molecule druggability , 2007, Nature Biotechnology.

[16]  P. Hajduk,et al.  Predicting protein druggability. , 2005, Drug discovery today.

[17]  Geoffrey J. Barton,et al.  JPred : a consensus secondary structure prediction server , 1999 .

[18]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[19]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[20]  Serge Batalov,et al.  The promise of genomics to identify novel therapeutic targets , 2004, Expert opinion on therapeutic targets.

[21]  S. Lampel,et al.  The druggable genome: an update. , 2005, Drug discovery today.

[22]  P. Bork,et al.  Drug Target Identification Using Side-Effect Similarity , 2008, Science.

[23]  N. Blom,et al.  Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. , 1999, Journal of molecular biology.

[24]  Peter Imming,et al.  Drugs, their targets and the nature and number of drug targets , 2007, Nature Reviews Drug Discovery.

[25]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[26]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[27]  Y. Z. Chen,et al.  Therapeutic Targets: Progress of Their Exploration and Investigation of Their Characteristics , 2006, Pharmacological Reviews.

[28]  Albert Kriegner,et al.  Characterization of the drugged human genome. , 2007, Pharmacogenomics.

[29]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[30]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[31]  J. Drews Drug discovery: a historical perspective. , 2000, Science.

[32]  A. Hopkins,et al.  The druggable genome , 2002, Nature Reviews Drug Discovery.

[33]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .