Surrogate data – a secure way to share corporate data

SummaryThe privacy of chemical structure is of paramount importance for the industrial sector, in particular for the pharmaceutical industry. At the same time, companies handle large amounts of physico-chemical and biological data that could be shared in order to improve our molecular understanding of pharmacokinetic and toxicological properties, which could lead to improved predictivity and shorten the development time for drugs, in particular in the early phases of drug discovery. The current study provides some theoretical limits on the information required to produce reverse engineering of molecules from generated descriptors and demonstrates that the information content of molecules can be as low as less than one bit per atom. Thus theoretically just one descriptor can be used to completely disclose the molecular structure. Instead of sharing descriptors, we propose to share surrogate data. The sharing of surrogate data is nothing else but sharing of reliably predicted molecules. The use of surrogate data can provide the same information as the original set. We consider the practical application of this idea to predict lipophilicity of chemical compounds and we demonstrate that surrogate and real (original) data provides similar prediction ability. Thus, our proposed strategy makes it possible not only to share descriptors, but also complete collections of surrogate molecules without the danger of disclosing the underlying molecular structures.

[1]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[2]  Igor V. Tetko,et al.  Application of a Pruning Algorithm To Optimize Artificial Neural Networks for Pharmaceutical Fingerprinting , 1998, J. Chem. Inf. Comput. Sci..

[3]  Elizabeth Wilson,et al.  Is safe exchange of data possible , 2005 .

[4]  Charles L. Wilkins ACS analytical chemistry division highlights the latest developments in most areas , 2005 .

[5]  200th American chemical society national meeting , 1990 .

[6]  Igor V. Tetko,et al.  Associative Neural Network , 2002, Neural Processing Letters.

[7]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[8]  G. Poda,et al.  Application of ALOGPS 2.1 to predict log D distribution coefficient for Pfizer proprietary compounds. , 2004, Journal of medicinal chemistry.

[9]  R. W. Hansen,et al.  The price of innovation: new estimates of drug development costs. , 2003, Journal of health economics.

[10]  Sergei V. Trepalin,et al.  New Diversity Calculations Algorithms Used for Compound Selection , 2002, J. Chem. Inf. Comput. Sci..

[11]  Tudor I. Oprea On the information content of 2D and 3D descriptors for QSAR , 2002 .

[12]  Igor V. Tetko,et al.  Neural Network Studies, 4. Introduction to Associative Neural Networks , 2002, J. Chem. Inf. Comput. Sci..

[13]  I. Tetko,et al.  Application of ALOGPS to predict 1-octanol/water distribution coefficients, logP, and logD, of AstraZeneca in-house database. , 2004, Journal of pharmaceutical sciences.

[14]  Lemont B. Kier,et al.  An Electrotopological-State Index for Atoms in Molecules , 1990, Pharmaceutical Research.

[15]  Matthew Walker,et al.  Training ACD/LogP with Experimental Data , 2004 .

[16]  Igor V. Tetko,et al.  Prediction of n-Octanol/Water Partition Coefficients from PHYSPROP Database Using Artificial Neural Networks and E-State Indices , 2001, J. Chem. Inf. Comput. Sci..

[17]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[18]  Gerhard Klebe,et al.  Comparison of Automatic Three-Dimensional Model Builders Using 639 X-ray Structures , 1994, J. Chem. Inf. Comput. Sci..

[19]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[20]  April M. Love,et al.  American Chemical Society National Meeting , 2003 .

[21]  L. Hall,et al.  Molecular Structure Description: The Electrotopological State , 1999 .

[22]  Alexandre Varnek,et al.  Modeling of Ion Complexation and Extraction Using Substructural Molecular Fragments , 2000, J. Chem. Inf. Comput. Sci..