Towards in-house searching of Markush structures from patents☆

Most large pharmaceutical and biotechnology companies now use Oracle RDBMS chemistry data cartridges to manage databases of individual molecules for chemical structure searching. These systems are often linked to processes for new drug discovery and provide a common interface to a diverse range of specific structure databases. Recently some of these cartridges have been extended to handle Markush representations of un-enumerated combinatorial libraries alongside discrete molecules. An obvious extension would be to enable them to handle the Markush structures from chemical patents, though these have features and complexities not required for the representation of combinatorial libraries. The existing publicly available systems for handling patent Markush structures have changed little in the past 15 years and cannot easily be integrated with in-house systems; in-house access to chemical structures from patents is thus restricted at present to databases of specific molecules. A number of technical issues need to be tackled to enable the existing Markush-capable Oracle cartridges to handle data from patents, and several options are available for obtaining appropriate Markush structure databases for use with them. A demonstration system has been developed, using data from Thomson Reuters' World Patents Index Markush File, and Digital Chemistry's Oracle cartridge Torus. In-house access to patent Markush data could provide improved informatics support to the drug discovery process, both to enable patentability criteria to be added to computer-assisted drug design, and to expand the techniques available for data-mining in the patent literature.

[1]  Daniel C. Weaver Applying data mining techniques to library design, lead generation and lead optimization. , 2004, Current opinion in chemical biology.

[2]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents. 11. Theoretical aspects of the use of structure languages in a retrieval system , 1991, J. Chem. Inf. Comput. Sci..

[3]  Mitchell A. Miller Chemical database techniques in drug discovery , 2002, Nature Reviews Drug Discovery.

[4]  Andrew H. Berks,et al.  Current State of the Art of Markush Topological Search Systems , 2001 .

[5]  Edlyn S. Simmons,et al.  Markush structure searching over the years , 2003 .

[6]  M. Calcagno An investigation into analyzing patents by chemical structure using Thomson's Derwent World Patent Index codes , 2008 .

[7]  John M. Barnard,et al.  Clustering Methods and Their Uses in Computational Chemistry , 2003 .

[8]  David Weininger,et al.  Stigmata: An Algorithm To Determine Structural Commonalities in Diverse Datasets , 1996, J. Chem. Inf. Comput. Sci..

[9]  D. Banville Mining chemical structural information from the drug literature. , 2006, Drug discovery today.

[10]  Paul Watson,et al.  A web-based platform for virtual screening. , 2003, Journal of molecular graphics & modelling.

[11]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[12]  Christiane Emmerich Comparing first level patent data with value-added patent information: A case study in the pharmaceutical field , 2009 .

[13]  John M. Barnard A comparison of different approaches to Markush structure handling , 1991, J. Chem. Inf. Comput. Sci..

[14]  Darren V S Green,et al.  Virtual screening of virtual libraries. , 2003, Progress in medicinal chemistry.