Impact of different software implementations on the performance of the Maxmin method for diverse subset selection

Besides the choice of an automated software method for selecting ‘maximally diverse’ compounds from a large pool of molecules, it is the implementation of the algorithm that critically determines the usefulness of the approach. The speed of execution of two implementations of the Maxmin algorithm is compared for the selection of maximally diverse subsets of large compound collections. Different versions of the software are compared using various C compiler options and Java virtual machines. The analysis shows that the Maxmin algorithm can be implemented in both languages yielding sufficient speed of execution. For large compound libraries the Java version outperformes the C version. While the Java version selects the same compounds independent of the virtual machine used, the C version produces slightly different subsets depending on the compiler and on the optimization settings.

[1]  Rosalia Pascual,et al.  Analysis of selection methodologies for combinatorial library design , 2004, Molecular Diversity.

[2]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[3]  Peter Willett,et al.  Dissimilarity-based compound selection for library design , 2001 .

[4]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[5]  Robert D. Clark,et al.  Relative and absolute diversity analysis of combinatorial libraries , 2001 .

[6]  John M. Barnard,et al.  Identification of diverse database subsets using property-based and fragment-based molecular descriptions , 2002 .

[7]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[8]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[9]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[10]  Bruce Eckel Thinking in Java , 1998 .

[11]  Roger E. Critchlow,et al.  Beyond mere diversity: tailoring combinatorial libraries for drug discovery. , 1999, Journal of combinatorial chemistry.

[12]  Brian W. Kernighan,et al.  The C Programming Language , 1978 .

[13]  Gisbert Schneider,et al.  SMILIB: Rapid Assembly of Combinatorial Libraries in SMILES Notation , 2003 .

[14]  Denis M. Bayada,et al.  Molecular Diversity and Representativity in Chemical Databases. , 1999 .