Exploring sets of molecules from patents and relationships to other active compounds in chemical space networks

Patents from medicinal chemistry represent a rich source of novel compounds and activity data that appear only infrequently in the scientific literature. Moreover, patent information provides a primary focal point for drug discovery. Accordingly, text mining and image extraction approaches have become hot topics in patent analysis and repositories of patent data are being established. In this work, we have generated network representations using alternative similarity measures to systematically compare molecules from patents with other bioactive compounds, visualize similarity relationships, explore the chemical neighbourhood of patent molecules, and identify closely related compounds with different activities. The design of network representations that combine patent molecules and other bioactive compounds and view patent information in the context of current bioactive chemical space aids in the analysis of patents and further extends the use of molecular networks to explore structure–activity relationships.

[1]  Jürgen Bajorath,et al.  Design of chemical space networks using a Tanimoto similarity variant based upon maximum common substructures , 2015, Journal of Computer-Aided Molecular Design.

[2]  Jens Sadowski,et al.  Structure Modification in Chemical Databases , 2005 .

[3]  Jürgen Bajorath,et al.  MMP-Cliffs: Systematic Identification of Activity Cliffs on the Basis of Matched Molecular Pairs , 2012, J. Chem. Inf. Model..

[4]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[5]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[6]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[7]  Jürgen Bajorath,et al.  Lessons learned from the design of chemical space networks and opportunities for new applications , 2016, Journal of Computer-Aided Molecular Design.

[8]  Mervyn Bregonje,et al.  Patents: A unique source for scientific technical information in chemistry related industry? , 2005 .

[9]  George Papadatos,et al.  SureChEMBL: a large-scale, chemically annotated patent document database , 2015, Nucleic Acids Res..

[10]  Ying Chen,et al.  Mining Patents Using Molecular Similarity Search , 2006, Pacific Symposium on Biocomputing.

[11]  G. Maggiora,et al.  Molecular similarity in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[12]  Jürgen Bajorath,et al.  Chemical space networks: a powerful new paradigm for the description of chemical space , 2014, Journal of Computer-Aided Molecular Design.

[13]  Jameed Hussain,et al.  Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets , 2010, J. Chem. Inf. Model..

[14]  Peter Murray-Rust,et al.  Mining chemical information from open patents , 2011, J. Cheminformatics.

[15]  John M. Barnard,et al.  Chemical patent information systems , 2011 .

[16]  Jürgen Bajorath,et al.  Comparison of bioactive chemical space networks generated using substructure- and fingerprint-based measures of molecular similarity , 2015, Journal of Computer-Aided Molecular Design.

[17]  Christopher Southan,et al.  Expanding opportunities for mining bioactive chemistry from patents , 2015, Drug discovery today. Technologies.

[18]  D. Banville Mining chemical structural information from the drug literature. , 2006, Drug discovery today.

[19]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[20]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..