Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work.

[1]  Christian Lemmen,et al.  Computational methods for the structural alignment of molecules , 2000, J. Comput. Aided Mol. Des..

[2]  Valler,et al.  Diversity screening versus focussed screening in drug discovery. , 2000, Drug discovery today.

[3]  Ferran Sanz,et al.  Automatic search for maximum similarity between molecular electrostatic potential distributions , 1991, J. Comput. Aided Mol. Des..

[4]  P. Gund Three-Dimensional Pharmacophoric Pattern Searching , 1977 .

[5]  P. Willett,et al.  Pharmacophoric pattern matching in files of 3d chemical structures: comparison of geometric searching algorithms , 1987 .

[6]  Blaise Cronin,et al.  The citation process: The role and significance of citations in scientific communication , 1984 .

[7]  J. E. Crowe,et al.  Documentation of Chemical Reactions by Computer Analysis of Structural Changes , 1967 .

[8]  Robert D. Brown,et al.  Combinatorial library design for diversity, cost efficiency, and drug-like character. , 2000, Journal of molecular graphics & modelling.

[9]  Peter Willett,et al.  Implementation of nearest-neighbor searching in an online chemical structure search system , 1986, J. Chem. Inf. Comput. Sci..

[10]  Ray A. Jarvis,et al.  Clustering Using a Similarity Measure Based on Shared Near Neighbors , 1973, IEEE Transactions on Computers.

[11]  Dennis H. Rouvray Computational chemical graph theory , 1990 .

[12]  P. Sneath Relations between chemical structure and biological activity in peptides. , 1966, Journal of theoretical biology.

[13]  H. Kubinyi,et al.  3D QSAR in drug design. , 2002 .

[14]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Chemical Structures: Analysis of the BIOSTER Database Using Two-Dimensional Fingerprints and Molecular Field Descriptors , 2000, J. Chem. Inf. Comput. Sci..

[15]  Michael F. Lynch,et al.  Computer storage and retrieval of generic structures in chemical patents. 4. An extended connection table representation for generic structures , 1982, J. Chem. Inf. Comput. Sci..

[16]  R Green,et al.  Chemoinformatics--a new name for an old problem? , 1999, Current opinion in chemical biology.

[17]  A. N. Jain,et al.  IcePick: a flexible surface-based system for molecular diversity. , 1999, Journal of medicinal chemistry.

[18]  Michael F. Lynch,et al.  Computer Storage and Retrieval of Generic Chemical Structures in Patents. Part 9. An Algorithm to Find the Extended Set of Smallest Rings in Structurally Explicit Generics. , 1989 .

[19]  Patricia S. Wilson,et al.  The Chemical Abstracts Service generic chemical (Markush) structure storage and retrieval capability. 2. The MARPAT file , 1991, J. Chem. Inf. Comput. Sci..

[20]  Michael F. Lynch,et al.  Computer Storage and Retrieval of Generic Chemical Structures in Patents. Part 12. Principles of Search Operations Involving Parameter Lists: Matching‐Relations, User‐Defined Match Levels, and Transition from the Reduced Graph Search to the Refined Search. , 1991 .

[21]  William Fisanick,et al.  Similarity searching on CAS Registry substances. 1. Global molecular property and generic atom triangle geometric searching , 1992, J. Chem. Inf. Comput. Sci..

[22]  Michael F. Lynch,et al.  Review of ring perception algorithms for chemical graphs , 1989, J. Chem. Inf. Comput. Sci..

[23]  Peter Willett,et al.  Implementation of nonhierarchic cluster analysis methods in chemical information systems: selection of compounds for biological testing and clustering of substructure search output , 1986, J. Chem. Inf. Comput. Sci..

[24]  Peter J. Fleming,et al.  Combinatorial Library Design Using a Multiobjective Genetic Algorithm , 2002, J. Chem. Inf. Comput. Sci..

[25]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[26]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[27]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[28]  Michael F. Lynch,et al.  Computer Storage and Retrieval of Generic Chemical Structures in Patents. Part 17. Evaluation of the Refined Search. , 1995 .

[29]  Eugene Garfield,et al.  Citation indexing - its theory and application in science, technology, and humanities , 1979 .

[30]  P. Willett,et al.  Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. , 2000, Journal of molecular graphics & modelling.

[31]  Peter Willett,et al.  A publication and citation analysis of the Department of Information Studies, University of Sheffield, 1980-1990 , 1992, J. Inf. Sci..

[32]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents. 6. An interpreter program for the generic structure description language GENSAL , 1984, J. Chem. Inf. Comput. Sci..

[33]  Michael F. Lynch SUBJECT INDEXES AND AUTOMATIC DOCUMENT RETRIEVAL: The Structure of Entries in Chemical Abstracts Subject Indexes , 1966 .

[34]  Mengxiong Liu,et al.  Progress in Documentation the Complexities of citation Practice: a Review of citation studies , 1993, J. Documentation.

[35]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[36]  Gareth Jones,et al.  Pharmacophoric pattern matching in files of three-dimensional chemical structures: Comparison of conformational-searching algorithms for flexible searching , 1994, J. Chem. Inf. Comput. Sci..

[37]  Peter Willett,et al.  Techniques for the calculation of three-dimensional structural similarity using inter-atomic distances , 1991, J. Comput. Aided Mol. Des..

[38]  Peter Willett,et al.  Promoting Access to White Rose Research Papers Effectiveness of Graph-based and Fingerprint-based Similarity Measures for Virtual Screening of 2d Chemical Structure Databases , 2022 .

[39]  Peter Willett,et al.  Selection of screens for three-dimensional substructure searching , 1990 .

[40]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents. 10. Assignment and logical bubble-up of ring screens for structurally explicit generics , 1989, J. Chem. Inf. Comput. Sci..

[41]  George W. Adamson,et al.  A method for the automatic classification of chemical structures , 1973, Inf. Storage Retr..

[42]  John M. Barnard,et al.  Substructure searching methods: Old and new , 1993, J. Chem. Inf. Comput. Sci..

[43]  P Willett,et al.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. , 2002, Combinatorial chemistry & high throughput screening.

[44]  D K Agrafiotis,et al.  Kolmogorov-Smirnov statistic and its application in library design. , 2000, Journal of molecular graphics & modelling.

[45]  Richards Wg,et al.  QSAR's from similarity matrices. Technique validation and application in the comparison of different similarity evaluation methods. , 1993 .

[46]  Michael F. Lynch,et al.  The Sheffield Generic Structures Project-a Retrospective Review , 1996, J. Chem. Inf. Comput. Sci..

[47]  James B. Dunbar,et al.  Enhancing the diversity of a corporate database using chemical database clustering and analysis , 1995, J. Comput. Aided Mol. Des..

[48]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[49]  P Willett,et al.  Pharmacophoric pattern matching in files of three-dimensional chemical structures: use of bounded distance matrices for the representation and searching of conformationally flexible molecules. , 1992, Journal of molecular graphics.

[50]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents, 2. GENSAL, a formal language for the description of generic chemical structures , 1981, J. Chem. Inf. Comput. Sci..

[51]  Michael F. Lynch,et al.  Computer Storage and Retrieval of Generic Chemical Structures in Patents. Part 14. Fragment Generation from Generic Structures. , 1993 .

[52]  Peter Willett,et al.  Algorithms for the identification of three-dimensional maximal common substructures , 1987, J. Chem. Inf. Comput. Sci..

[53]  Peter Willett,et al.  Designing focused libraries using MoSELECT. , 2002, Journal of molecular graphics & modelling.

[54]  Peter Willett,et al.  RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs , 2002, Comput. J..

[55]  Peter Willett,et al.  Evaluation of a novel infrared range vibration-based descriptor (EVA) for QSAR studies. 1. General application , 1997, J. Comput. Aided Mol. Des..

[56]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Chemical Structures. Alignment of Molecular Electrostatic Potential Fields with a Genetic Algorithm , 1996, J. Chem. Inf. Comput. Sci..

[57]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[58]  William Fisanick,et al.  The Chemical Abstract's Service generic chemical (Markush) structure storage and retrieval capability. 1. Basic concepts , 1990, J. Chem. Inf. Comput. Sci..

[59]  Ramon Carbo,et al.  How similar is a molecule to another? An electron density measure of similarity between two molecular structures , 1980 .

[60]  Michael F. Lynch,et al.  Information retrieval research in the Department of Information Studies, University of Sheffield: 1965-1985 , 1987, J. Inf. Sci..

[61]  Peter Willett,et al.  Pharmacophoric pattern matching in files of 3-D chemical structures: election of interatomic distance screens , 1986 .

[62]  P. Willett,et al.  Combination of molecular similarity measures using data fusion , 2000 .

[63]  Peter Willett,et al.  Selection of reagents for combinatorial synthesis using clique detection , 1998 .

[64]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[65]  John Bradshaw,et al.  The Effectiveness of Reactant Pools for Generating Structurally-Diverse Combinatorial Libraries , 1997, J. Chem. Inf. Comput. Sci..

[66]  Journal of Information Science , 1984 .

[67]  S J Peterson,et al.  QSAR's from similarity matrices. Technique validation and application in the comparison of different similarity evaluation methods. , 1993, Journal of medicinal chemistry.

[68]  Andrew C. Good,et al.  Utilization of Gaussian functions for the rapid evaluation of molecular similarity , 1992, J. Chem. Inf. Comput. Sci..

[69]  P. Willett,et al.  A Fast Algorithm For Selecting Sets Of Dissimilar Molecules From Large Chemical Databases , 1995 .

[70]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[71]  時實 象一 Computer storage and retrieval of generic chemical structures , 1987 .

[72]  Wendy A. Warr Chemical Structures 2 , 1993 .

[73]  Robert P. Sheridan,et al.  Chemical Similarity Using Geometric Atom Pair Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[74]  Peter Willett,et al.  Research degrees in librarianship and information science , 1998 .

[75]  Nick A. Farmer,et al.  The CAS ONLINE search system. 1. General system design and selection, generation, and use of search screens , 1983, J. Chem. Inf. Comput. Sci..

[76]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[77]  David Bawden,et al.  Molecular Dissimilarity in Chemical Information Systems , 1993 .

[78]  Tad Hurst,et al.  Flexible 3D searching: The directed tweak technique , 1994, J. Chem. Inf. Comput. Sci..

[79]  Dimitris K. Agrafiotis,et al.  An Efficient Implementation of Distance-Based Diversity Measures Based on k-d Trees , 1999, J. Chem. Inf. Comput. Sci..

[80]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents. 14. Fragment generation from generic structures , 1992, J. Chem. Inf. Comput. Sci..

[81]  Andrew C. Good,et al.  New molecular shape descriptors: Application in database screening , 1995, J. Comput. Aided Mol. Des..

[82]  Soumen Chakrabarti,et al.  Similarity and Clustering , 2003 .

[83]  Gordon M. Crippen,et al.  Distance Geometry and Molecular Conformation , 1988 .

[84]  Valerie J. Gillet,et al.  Computer storage and retrieval of generic chemical structures in patents. 8. Reduced chemical graphs and their applications in generic chemical structure retrieval , 1987, J. Chem. Inf. Comput. Sci..

[85]  David Bawden,et al.  Pharmacophoric pattern matching in files of 3d chemical structures: evaluation of search performance , 1987 .

[86]  C C Cheng,et al.  Common receptor-complement feature among some antileukemic compounds. , 1970, Journal of pharmaceutical sciences.

[87]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents. 11. Theoretical aspects of the use of structure languages in a retrieval system , 1991, J. Chem. Inf. Comput. Sci..

[88]  Peter Willett,et al.  Designing bioactive molecules : three-dimensional techniques and applications , 1998 .

[89]  P Willett,et al.  Development and validation of a genetic algorithm for flexible docking. , 1997, Journal of molecular biology.

[90]  Y. Martin,et al.  Designing combinatorial library mixtures using a genetic algorithm. , 1997, Journal of medicinal chemistry.

[91]  M. Murcko,et al.  Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. , 1999, Journal of medicinal chemistry.

[92]  Kathleen E Shenton,et al.  Generic Searching of Patent Information , 1988 .

[93]  Michael F. Lynch,et al.  Computer Storage and Retrieval of Generic Chemical Structures in Patents, 16. The Refined Search: An Algorithm for Matching Components of Generic Chemical Structures at the Atom-Bond Level , 1995, J. Chem. Inf. Comput. Sci..

[94]  M. S. Lajiness,et al.  Molecular similarity-based methods for selecting compounds for screening , 1990 .

[95]  W. E. Cossum,et al.  Advances in Automatic Chemical Substructure Searching Techniques. , 1965 .

[96]  Ellen M. Voorhees,et al.  Implementing agglomerative hierarchic clustering algorithms for use in document retrieval , 1986, Inf. Process. Manag..

[97]  L. C. Ray,et al.  Finding Chemical Records by Digital Computers. , 1957, Science.

[98]  Robert P. Sheridan,et al.  Chemical Similarity Using Physiochemical Property Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[99]  Peter Willett,et al.  Similarity Searching and Clustering of Chemical-Structure Databases Using Molecular Property Data , 1994, J. Chem. Inf. Comput. Sci..

[100]  Michael F. Lynch,et al.  Chemical-Biological Activities: A Computer-produced Express Digest. , 1963 .

[101]  Peter Willett,et al.  Similarity searching in files of three-dimensional chemical structures: Comparison of fragment-based measures of shape similarity , 1994, J. Chem. Inf. Comput. Sci..

[102]  John Bradshaw,et al.  Similarity Searching Using Reduced Graphs , 2003, J. Chem. Inf. Comput. Sci..

[103]  Robert P. Sheridan,et al.  3DSEARCH: a system for three-dimensional substructure searching , 1989, J. Chem. Inf. Comput. Sci..

[104]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents. 15. Generation of topological fragment descriptors from nontopological representations of generic structure components , 1993, J. Chem. Inf. Comput. Sci..

[105]  Philip M. Dean,et al.  Molecular diversity in drug design , 2002 .

[106]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[107]  Jordi Mestres,et al.  A molecular-field-based similarity study of non-nucleoside HIV-1 reverse transcriptase inhibitors. 2. The relationship between alignment solutions obtained from conformationally rigid and flexible matching , 2000, J. Comput. Aided Mol. Des..

[108]  Gareth Jones,et al.  A genetic algorithm for flexible molecular overlay and pharmacophore elucidation , 1995, J. Comput. Aided Mol. Des..

[109]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[110]  Michael F. Lynch,et al.  Computer Storage and Retrieval of Generic Chemical Structures in Patents. Part 10. Assignment and Logical Bubble-Up of Ring Screens for Structurally Explicit Generics. , 1989 .

[111]  Thomas E. Moock,et al.  Conformational searching in ISIS/3D databases , 1994, J. Chem. Inf. Comput. Sci..

[112]  Eleanor J. Gardiner,et al.  Clique-detection algorithms for matching three-dimensional molecular structures. , 1997, Journal of molecular graphics & modelling.

[113]  Darren V. S. Green,et al.  Selecting Combinatorial Libraries to Optimize Diversity and Physical Properties , 1999, J. Chem. Inf. Comput. Sci..

[114]  Alan H. Lipkus,et al.  Similarity searching on CAS Registry substances. 2. 2D structural similarity , 1994, J. Chem. Inf. Comput. Sci..

[115]  Marvin Waldman,et al.  Evaluation of Reagent-Based and Product-Based Strategies in the Design of Combinatorial Library Subsets , 2000, J. Chem. Inf. Comput. Sci..

[116]  P Willett,et al.  Comparison of algorithms for dissimilarity-based compound selection. , 1997, Journal of molecular graphics & modelling.

[117]  Peter Willett,et al.  Graph theoretic methods for the analysis of structural relationships in biological macromolecules , 2005, J. Assoc. Inf. Sci. Technol..

[118]  P. J. Harrison,et al.  A Method of Cluster Analysis and Some Applications , 1968 .

[119]  Charles Oppenheim,et al.  Do citations matter? , 1994, J. Inf. Sci..

[120]  Peter Willett,et al.  Representation, searching and discovery of patterns of bases in complex RNA structures , 2003, J. Comput. Aided Mol. Des..

[121]  Blaise Cronin,et al.  Profiling the professors , 1989, J. Inf. Sci..

[122]  GEOFFREY M. DOWNS,et al.  Computer storage and retrieval of generic chemical structures in patents. 9. An algorithm to find the extended set of smallest rings in structurally explicit generics , 1989, J. Chem. Inf. Comput. Sci..

[123]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents. 13. Reduced graph generation , 1991, J. Chem. Inf. Comput. Sci..

[124]  P. Willett,et al.  Implementation of nonhierarchic cluster analysis methods in chemical information structure search , 1986 .