Automatic Interpretation of Chemical Structure Diagrams

Chemical structure diagrams, just as in engineering drawings, maps, and other technical diagrams, consist of solid and dashed lines (bonds), characters (atom symbols), and other symbols such as brackets, parentheses, wedges (stereo-up bonds) or dashed wedges (stereo-down bonds). In addition to recognizing these low-level elements of such drawings, other artifacts may be present — bonds intersections may be crossings or atom nodes, character strings may represent underlying chemical structure, and circles are sometimes used to represent ring-alternating bonding — requiring a considerable knowledge base of chemistry to be able to interpret correctly. This paper discusses the general processes used in the program Kekule 1 that embodies this interpretation ability with more detailed explanations of how some problems relating to polygon approximation, dashed line and dashed wedge finding, and optical character recognition were solved.

[1]  T. Pavlidis Algorithms for Graphics and Image Processing , 1981, Springer Berlin Heidelberg.

[2]  Alireza Khotanzad,et al.  Invariant Image Recognition by Zernike Moments , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  M. Teague Image analysis via the general theory of moments , 1980 .

[4]  A Korpel,et al.  Gabor: frequency, time, and memory. , 1982, Applied optics.

[5]  Isabelle Guyon,et al.  Hardware requirements for neural-net optical character recognition , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[6]  Rangachar Kasturi,et al.  Information Extraction of Paper-Based Maps , 1988, IEEE Trans. Software Eng..

[7]  Anne Rogers Computer Software Review. Kukulé for Windows: The Complete Structure Input System , 1994 .

[8]  Roberto Rozas,et al.  Automatic processing of graphics for image databases in science , 1990, J. Chem. Inf. Comput. Sci..

[9]  King-Sun Fu,et al.  Shape Discrimination Using Fourier Descriptors , 1977, IEEE Trans. Syst. Man Cybern..

[10]  Ralph Roskies,et al.  Fourier Descriptors for Plane Closed Curves , 1972, IEEE Transactions on Computers.

[11]  Jacques-Emile Dubois,et al.  Simulation of infrared spectra: an infrared spectral simulation program (SIRS) which uses DARC topological substructures , 1990, Journal of chemical information and computer sciences.

[12]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[13]  V. K. Govindan,et al.  Character recognition - A review , 1990, Pattern Recognit..

[14]  Azriel Rosenfeld,et al.  Digital Picture Processing , 1976 .

[15]  Raymond E. Dessy,et al.  Scanning for Structures , 1994 .

[16]  Theodosios Pavlidis,et al.  On the Recognition of Printed Characters of Any Font and Size , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Antanas Verikas,et al.  Optical character recognition based on analog preprocessing and automatic feature extraction , 1985, Comput. Vis. Graph. Image Process..