Automatic generation of Markush structures from specific compounds

Abstract Markush structures play an important role in cheminformatics, especially in chemical patents. This paper presents a novel algorithm for automatically generating Markush structures from series of specific compounds. This method can effectively be used to assist patent drafting or to compose combinatorial libraries based on several molecules of interest. According to the authors’ knowledge, the presented algorithm is the first solution to this problem. It is available in multiple software products of ChemAxon.

[1]  Pierre Benichou,et al.  Handling Genericity in Chemical Structures Using the Markush Darc Software , 1997, J. Chem. Inf. Comput. Sci..

[2]  Ramesh Hariharan,et al.  MultiMCS: A Fast Algorithm for the Maximum Common Substructure Problem on Multiple Molecules , 2011, J. Chem. Inf. Model..

[3]  William Fisanick,et al.  The Chemical Abstract's Service generic chemical (Markush) structure storage and retrieval capability. 1. Basic concepts , 1990, J. Chem. Inf. Comput. Sci..

[4]  Patricia S. Wilson,et al.  The Chemical Abstracts Service generic chemical (Markush) structure storage and retrieval capability. 2. The MARPAT file , 1991, J. Chem. Inf. Comput. Sci..

[5]  John M. Barnard A comparison of different approaches to Markush structure handling , 1991, J. Chem. Inf. Comput. Sci..

[6]  Janna Hastings,et al.  FMCS: a novel algorithm for the multiple MCS problem , 2013, Journal of Cheminformatics.

[7]  Nathan Brown,et al.  Chemoinformatics—an introduction for computer scientists , 2009, CSUR.

[8]  Harald Mauser,et al.  A robust clustering method for chemical structures. , 2005, Journal of medicinal chemistry.

[9]  Jon Winter,et al.  A System for Encoding and Searching Markush Structures , 2012, J. Chem. Inf. Model..

[10]  Peter Willett,et al.  Representing Clusters Using a Maximum Common Edge Substructure Algorithm Applied to Reduced Graphs and Molecular Graphs , 2007, J. Chem. Inf. Model..

[11]  Edlyn S. Simmons,et al.  Markush structure searching over the years , 2003 .

[12]  Michael F. Lynch,et al.  The Sheffield Generic Structures Project-a Retrospective Review , 1996, J. Chem. Inf. Comput. Sci..

[13]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents, 2. GENSAL, a formal language for the description of generic chemical structures , 1981, J. Chem. Inf. Comput. Sci..

[14]  Thomas Engel,et al.  Basic Overview of Chemoinformatics , 2006, J. Chem. Inf. Model..

[15]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents, 3. Chemical grammars and their role in the manipulation of chemical structures , 1981, J. Chem. Inf. Comput. Sci..

[16]  Edlyn S. Simmons The grammar of Markush structure searching: vocabulary vs. syntax , 1991, J. Chem. Inf. Comput. Sci..

[17]  Andrew H. Berks,et al.  Current State of the Art of Markush Topological Search Systems , 2001 .

[18]  M. Calcagno An investigation into analyzing patents by chemical structure using Thomson's Derwent World Patent Index codes , 2008 .

[19]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[20]  John M. Barnard,et al.  Towards in-house searching of Markush structures from patents☆ , 2009 .

[21]  Michael F. Lynch,et al.  Generic structure storage and retrieval , 1985, J. Chem. Inf. Comput. Sci..

[22]  P. Willett Searching techniques for databases of two- and three-dimensional chemical structures. , 2005, Journal of medicinal chemistry.

[23]  Wei Deng,et al.  Intuitive Patent Markush Structure Visualization Tool for Medicinal Chemists , 2011, J. Chem. Inf. Model..

[24]  Andreas Barth A deep analysis of chemical structure-based patent searching in the Derwent index space , 2018 .

[25]  Alexander Böcker,et al.  Toward an Improved Clustering of Large Data Sets Using Maximum Common Substructures and Topological Fingerprints , 2008, J. Chem. Inf. Model..

[26]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents, 1. Introduction and general strategy , 1981, J. Chem. Inf. Comput. Sci..

[27]  Wei Deng,et al.  Deconvoluting complex patent Markush structures: A novel R-group numbering system , 2012 .

[28]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[29]  Hajime Tokuno Comparison of Markush structure databases , 1993, J. Chem. Inf. Comput. Sci..

[30]  Andreas Barth,et al.  A Novel Concept for the Search and Retrieval of the Derwent Markush Resource Database , 2016, J. Chem. Inf. Model..

[31]  Matthias Rarey,et al.  Maximum common subgraph isomorphism algorithms and their applications in molecular science: a review , 2011 .

[32]  Peter Willett,et al.  Maximum common subgraph isomorphism algorithms for the matching of chemical structures , 2002, J. Comput. Aided Mol. Des..

[33]  Péter Englert,et al.  Efficient Heuristics for Maximum Common Substructure Search , 2015, J. Chem. Inf. Model..

[34]  Harald Mauser,et al.  Database Clustering with a Combination of Fingerprint and Maximum Common Substructure Methods , 2005, J. Chem. Inf. Model..

[35]  Peter Willett,et al.  RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs , 2002, Comput. J..

[36]  Norman R. Schmuff,et al.  A comparison of the MARPAT and Markush DARC software , 1991, J. Chem. Inf. Comput. Sci..

[37]  Lucille J. Brown The Markush challenge , 1991, J. Chem. Inf. Comput. Sci..

[38]  Maik Annies Best practice in search and analysis of chemical formulations: From chemical recipes to complex formulation types and dosage forms , 2012 .