Chemical Machine Vision: Automated Extraction of Chemical Metadata from Raster Images

We present a novel application of machine vision methods for the identification of chemical composition diagrams from two-dimensional digital raster images. The method is based on the use of Gabor wavelets and an energy function to derive feature vectors from digital images. These are used for training and classification purposes using a Kohonen network for classification with the Euclidean distance norm. We compare this method with previous approaches to transforming such images to a molecular connection table, which are designed to achieve complete atom connection table fidelity but at the expense of requiring human interaction. The present texture-based approach is complementary in attempting to recognize higher order features such as the presence of a chemical representation in the original raster image. This information can be used for providing chemical metadata descriptors of the original image as part of a robot-based Internet resource discovery tool.

[1]  A. Peter Johnson,et al.  Chemical literature data extraction: The CLiDE Project , 1993, J. Chem. Inf. Comput. Sci..

[2]  Nalini K. Ratha,et al.  Object detection in the presence of clutter using Gabor filters , 1994, Optics & Photonics.

[3]  John Daugman,et al.  High Confidence Visual Recognition of Persons by a Test of Statistical Independence , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Armin B. Cremers,et al.  Identifying Buildings in Aerial Images Using Constraint Relaxation and Variable Elimination , 2000, IEEE Intell. Syst..

[5]  Joe R. McDaniel,et al.  Kekule: OCR-optical chemical (structure) recognition , 1992, J. Chem. Inf. Comput. Sci..

[6]  Henry S. Rzepa,et al.  JChemTidy: A Tool for Converting Chemical Web Document Collections to an XHTML Representation , 2001, J. Chem. Inf. Comput. Sci..

[7]  A. Peter Johnson,et al.  Recent Advances in the CLiDE Project: Logical Layout Analysis of Chemical Documents , 1997, J. Chem. Inf. Comput. Sci..

[8]  Sang Uk Lee,et al.  Integrated Position Estimation Using Aerial Image Sequences , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Nagaraj Nandhakumar,et al.  Thermophysical Algebraic Invariants from Infrared Imagery for Object Recognition , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Bedrich J. Hosticka,et al.  An unsupervised texture segmentation algorithm with feature space reduction and knowledge feedback , 1998, IEEE Trans. Image Process..

[11]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[12]  B. Anderson Kohonen Neural Networks and Language , 1999, Brain and Language.

[13]  Henry S. Rzepa,et al.  A robot-based resource discovery tool for adding chemical meta-information and value to web-based documents , 2001 .

[14]  Osei Adjei,et al.  Machine vision: an incremental learning system based on features derived using fast Gabor transforms for the identification of textural objects , 2001, SPIE Optics + Photonics.

[15]  Osei Adjei,et al.  Recognition of Human Faces Based on Fast Computation of Circular Harmonic Components , 2000, ICMI.

[16]  Demetri Terzopoulos,et al.  Deformable models in medical image analysis: a survey , 1996, Medical Image Anal..

[17]  Farzin Mokhtarian,et al.  Silhouette-Based Isolated Object Recognition through Curvature Scale Space , 1995, IEEE Trans. Pattern Anal. Mach. Intell..