Collaborative Filtering on a Family of Biological Targets

Building a QSAR model of a new biological target for which few screening data are available is a statistical challenge. However, the new target may be part of a bigger family, for which we have more screening data. Collaborative filtering or, more generally, multi-task learning, is a machine learning approach that improves the generalization performance of an algorithm by using information from related tasks as an inductive bias. We use collaborative filtering techniques for building predictive models that link multiple targets to multiple examples. The more commonalities between the targets, the better the multi-target model that can be built. We show an example of a multi-target neural network that can use family information to produce a predictive model of an undersampled target. We evaluate JRank, a kernel-based method designed for collaborative filtering. We show their performance on compound prioritization for an HTS campaign and the underlying shared representation between targets. JRank outperformed the neural network both in the single- and multi-target models.

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  Edmund R. Malinowski,et al.  Factor Analysis in Chemistry , 1980 .

[3]  A. Balaban Highly discriminating distance-based topological index , 1982 .

[4]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[5]  Milan Randic,et al.  On molecular identification numbers , 1984, J. Chem. Inf. Comput. Sci..

[6]  L. Kier Shape Indexes of Orders One and Three from Molecular Graphs , 1986 .

[7]  Michel Petitjean,et al.  Applications of the radius-diameter diagram to the classification of topological and geometrical shapes of chemical compounds , 1992, J. Chem. Inf. Comput. Sci..

[8]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[9]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[10]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[11]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[12]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[13]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[14]  Jonathan Baxter,et al.  Learning Model Bias , 1995, NIPS.

[15]  William I. Gasarch,et al.  Book Review: An introduction to Kolmogorov Complexity and its Applications Second Edition, 1997 by Ming Li and Paul Vitanyi (Springer (Graduate Text Series)) , 1997, SIGACT News.

[16]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[17]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[18]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[19]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[20]  Johann Gasteiger,et al.  Neural networks in chemistry and drug design , 1999 .

[21]  Tudor I. Oprea,et al.  Property distribution of drug-related chemical databases* , 2000, J. Comput. Aided Mol. Des..

[22]  Yoshua Bengio,et al.  Bias learning, knowledge sharing , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[23]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[24]  Koby Crammer,et al.  Pranking with Ranking , 2001, NIPS.

[25]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[26]  Thomas Hofmann,et al.  Unifying collaborative and content-based filtering , 2004, ICML.

[27]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[28]  Z. Deng,et al.  Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. , 2004, Journal of medicinal chemistry.

[29]  K. Wanner,et al.  Methods and Principles in Medicinal Chemistry , 2007 .

[30]  L. Hall,et al.  The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure‐Property Modeling , 2007 .