LOCAL KERNEL CANONICAL CORRELATION ANALYSIS WITH APPLICATION TO VIRTUAL DRUG SCREENING.

Drug discovery is the process of identifying compounds which have potentially meaningful biological activity. A major challenge that arises is that the number of compounds to search over can be quite large, sometimes numbering in the millions, making experimental testing intractable. For this reason computational methods are employed to filter out those compounds which do not exhibit strong biological activity. This filtering step, also called virtual screening reduces the search space, allowing for the remaining compounds to be experimentally tested.In this paper we propose several novel approaches to the problem of virtual screening based on Canonical Correlation Analysis (CCA) and on a kernel-based extension. Spectral learning ideas motivate our proposed new method called Indefinite Kernel CCA (IKCCA). We show the strong performance of this approach both for a toy problem as well as using real world data with dramatic improvements in predictive accuracy of virtual screening over an existing methodology.

[1]  Malte Kuss,et al.  The Geometry Of Kernel Canonical Correlation Analysis , 2003 .

[2]  John Shawe-Taylor,et al.  Sparse canonical correlation analysis , 2009, Machine Learning.

[3]  H. Vinod Canonical ridge and econometrics of joint production , 1976 .

[4]  Alexander Tropsha,et al.  Recent Trends in Quantitative Structure‐Activity Relationships , 2003 .

[5]  Roberto Todeschini,et al.  Molecular descriptors for chemoinformatics , 2009 .

[6]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[7]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[8]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[9]  C. Anderson‐Cook,et al.  An Introduction to Multivariate Statistical Analysis (3rd ed.) (Book) , 2004 .

[10]  Alexander Tropsha,et al.  Chemometric Analysis of Ligand Receptor Complementarity: Identifying Complementary Ligands Based on Receptor Information (CoLiBRI) , 2006, J. Chem. Inf. Model..

[11]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[12]  R. Shader,et al.  Burger's Medicinal Chemistry and Drug Discovery: , 1995 .

[13]  Alexandre d'Aspremont,et al.  Support vector machine classification with indefinite kernels , 2007, Math. Program. Comput..

[14]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003 .

[15]  Jieping Ye,et al.  Training SVM with indefinite kernels , 2008, ICML '08.

[16]  Colin Fyfe,et al.  Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.

[17]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[18]  Elena Parkhomenko,et al.  Sparse Canonical Correlation Analysis , 2008 .

[19]  Alexander J. Smola,et al.  Learning with non-positive kernels , 2004, ICML.

[20]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[21]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[22]  Bernard Haasdonk,et al.  Feature space interpretation of SVMs with indefinite kernels , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Renxiao Wang,et al.  The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. , 2004, Journal of medicinal chemistry.

[24]  J. Bajorath,et al.  Docking and scoring in virtual screening for drug discovery: methods and applications , 2004, Nature Reviews Drug Discovery.

[25]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.