Missing Value Estimation for Compound‐Target Activity Data

Relationships between drug targets and associated diseases have traditionally been investigated by means of sequence similarity, comparative protein modeling, and pathway analysis. Recently, a complementary paradigm has emerged to link targets and drugs via biological responses within activity data and visualize findings in networks. It has been indicated that one of the obstacles towards the identification of novel interactions is the sparsity of available data. In this article, we provide a survey of estimation methods that address the challenge of data sparsity. Each method is described in terms of its advantages and limitations, and an exemplary application on compound‐target activity data is demonstrated. With such imputation methods in‐hand, the opportunity to combine efforts in molecular informatics can be realized, yielding novel insights into ligand‐target space.

[1]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[2]  Hong Yan,et al.  Microarray missing data imputation based on a set theoretic framework and biological knowledge , 2006, Nucleic acids research.

[3]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[4]  J. Bajorath Computational analysis of ligand relationships within target families. , 2008, Current opinion in chemical biology.

[5]  Le Kang,et al.  LocustDB: a relational database for the transcriptome and biology of the migratory locust (Locusta migratoria) , 2006, BMC Genomics.

[6]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[7]  Iqbal Gondal,et al.  Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data , 2005, Bioinform..

[8]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[9]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[10]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[11]  Ming Ouyang,et al.  A meta-data based method for DNA microarray imputation , 2007, BMC Bioinformatics.

[12]  J. Bajorath,et al.  Systematic analysis of public domain compound potency data identifies selective molecular scaffolds across druggable target families. , 2010, Journal of medicinal chemistry.

[13]  Dragos Horvath,et al.  Neighborhood Behavior of in Silico Structural Spaces with Respect to in Vitro Activity Spaces-A Novel Understanding of the Molecular Similarity Principle in the Context of Multiple Receptor Binding Profiles , 2003, J. Chem. Inf. Comput. Sci..

[14]  Marianne A Grant,et al.  Protein structure prediction in structure-based ligand design and virtual screening. , 2009, Combinatorial chemistry & high throughput screening.

[15]  J. Mestres,et al.  Conciliating binding efficiency and polypharmacology. , 2009, Trends in pharmacological sciences.

[16]  J. Mestres,et al.  Drug‐Target Networks , 2010, Molecular informatics.

[17]  R. Solé,et al.  The topology of drug-target interaction networks: implicit dependence on drug properties and target families. , 2009, Molecular bioSystems.

[18]  R. Solé,et al.  Data completeness—the Achilles heel of drug-target networks , 2008, Nature Biotechnology.

[19]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Wolfgang Guba,et al.  From astemizole to a novel hit series of small-molecule somatostatin 5 receptor antagonists via GPCR affinity profiling. , 2007, Journal of medicinal chemistry.

[21]  P. Aloy,et al.  Unveiling the role of network and systems biology in drug discovery. , 2010, Trends in pharmacological sciences.

[22]  G. S. Johnson,et al.  An Information-Intensive Approach to the Molecular Pharmacology of Cancer , 1997, Science.

[23]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[24]  A. Hopkins Network pharmacology: the next paradigm in drug discovery. , 2008, Nature chemical biology.

[25]  J. Mestres,et al.  In Silico Receptorome Screening of Antipsychotic Drugs , 2010, Molecular informatics.

[26]  M. Milik,et al.  Mapping adverse drug reactions in chemical space. , 2009, Journal of medicinal chemistry.

[27]  Jin-Kao Hao,et al.  Pattern Recognition in Bioinformatics , 2013, Lecture Notes in Computer Science.

[28]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[29]  Kui Zhang,et al.  Application of imputation methods to the analysis of rheumatoid arthritis data in genome-wide association studies , 2009, BMC proceedings.

[30]  Ao Li,et al.  Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme , 2006, BMC Bioinformatics.

[31]  Craig K. Enders,et al.  An introduction to modern missing data analyses. , 2010, Journal of school psychology.

[32]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[33]  Iqbal Gondal,et al.  How to Improve Postgenomic Knowledge Discovery Using Imputation , 2009, EURASIP J. Bioinform. Syst. Biol..

[34]  Philip E. Bourne,et al.  A Multidimensional Strategy to Detect Polypharmacological Targets in the Absence of Structural and Sequence Homology , 2010, PLoS Comput. Biol..

[35]  G. V. Paolini,et al.  Global mapping of pharmacological space , 2006, Nature Biotechnology.

[36]  John P. Overington,et al.  Role of open chemical data in aiding drug discovery and design. , 2010, Future medicinal chemistry.

[37]  Guohui Lin,et al.  Iterated Local Least Squares Microarray Missing Value Imputation , 2006, J. Bioinform. Comput. Biol..

[38]  Tudor I. Oprea,et al.  Quantifying the Relationships among Drug Classes , 2008, J. Chem. Inf. Model..

[39]  A. Legedza,et al.  An Overview of Practical Approaches for Handling Missing Data in Clinical Trials , 2009, Journal of biopharmaceutical statistics.

[40]  Jason Weston,et al.  SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition , 2007, BMC Bioinformatics.

[41]  P Schneider,et al.  Self-organizing maps in drug discovery: compound library design, scaffold-hopping, repurposing. , 2009, Current medicinal chemistry.

[42]  Bin Chen,et al.  PubChem as a Source of Polypharmacology , 2009, J. Chem. Inf. Model..

[43]  Kimito Funatsu,et al.  Quantitative Prediction of Regioselectivity Toward Cytochrome P450/3A4 Using Machine Learning Approaches , 2010, Molecular informatics.

[44]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[45]  Gene H. Golub,et al.  Missing value estimation for DNA microarray gene expression data: local least squares imputation , 2005, Bioinform..

[46]  Michael T. M. Emmerich,et al.  A novel chemogenomics analysis of G protein-coupled receptors (GPCRs) and their ligands: a potential strategy for receptor de-orphanization , 2010, BMC Bioinformatics.

[47]  Ki-Yeol Kim,et al.  Reuse of imputed data in microarray analysis increases imputation efficiency , 2004, BMC Bioinformatics.

[48]  Yuan Ren,et al.  Classification for high-throughput data with an optimal subset of principal components , 2009, Comput. Biol. Chem..

[49]  A Rogier T Donders,et al.  Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example. , 2010, Journal of clinical epidemiology.

[50]  Ming Ouyang,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[51]  Qingxia Chen,et al.  Missing covariate data in medical research: to impute is better than to ignore. , 2010, Journal of clinical epidemiology.

[52]  Dragos Horvath,et al.  Neighborhood Behavior of in Silico Structural Spaces with Respect to In Vitro Activity Spaces-A Benchmark for Neighborhood Behavior Assessment of Different in Silico Similarity Metrics , 2003, J. Chem. Inf. Comput. Sci..

[53]  Laszlo Urban,et al.  Phenotypic screening: Fishing for neuroactive compounds. , 2010, Nature chemical biology.

[54]  Andrew L. Hopkins,et al.  Drug discovery: Predicting promiscuity , 2009, Nature.

[55]  Dieter Müller,et al.  Pathway analysis tools and toxicogenomics reference databases for risk assessment. , 2008, Pharmacogenomics.

[56]  Tero Aittokallio,et al.  Improving missing value estimation in microarray data with gene ontology , 2006, Bioinform..