Understanding Markush Structures in Chemistry Documents With Deep Learning

The retrieval of chemistry documents depends on the recognition of chemical structural formula (CSF) except texts, tables, and ordinary figures. A Markush structure represents a series of chemicals with similar structures. This research proposes a complete approach to analyze the Markush structures, including a deep-learning-based detection of CSFs, recognition of Markush structures and their corresponding tables, and reconstruction of the synthetic CSFs represented by one Markush structure. The results of experiments demonstrate that the proposed method obtains CSFs accurately from the original chemistry documents and provides a more efficient way with the automatically reconstructed molecules than the existing manually-drawing-based approach for further retrieval of Markush structures.

[1]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Guangshun Shi,et al.  A SVM-HMM Based Online Classifier for Handwritten Chemical Symbols , 2010, 2010 20th International Conference on Pattern Recognition.

[3]  John M. Barnard A comparison of different approaches to Markush structure handling , 1991, J. Chem. Inf. Comput. Sci..

[4]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[5]  Peter Geyer,et al.  Markush structure searching by information professionals in the chemical industry – Our views and expectations , 2013 .

[6]  Carina Haupt Markush Structure Reconstruction - A Prototype for their Reconstruction from Image and Text into a Searchable, Context Sensitive Grammar based Extension of SMILES , 2010, Informatiktage.

[7]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[9]  G. Schneider,et al.  Mapping Chemical Structures to Markush Structures Using SMIRKS , 2011, Molecular informatics.

[10]  Guangshun Shi,et al.  A study of on-line handwritten chemical expressions recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[11]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[12]  Wei Deng,et al.  Intuitive Patent Markush Structure Visualization Tool for Medicinal Chemists , 2011, J. Chem. Inf. Model..

[13]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .