Machine Learning Enhanced Spectrum Recognition Based on Computer Vision (SRCV) for Intelligent NMR Data Extraction

A machine learning enhanced spectrum recognition system called spectrum recognition based on computer vision (SRCV) for data extraction from previously analyzed 13C and 1H NMR spectra has been developed. The intelligent system was designed with four function modules to extract data from three areas of NMR images, including 13C and 1H chemical shifts, the integral, and the range of the shift values. During this study, three machine learning models were pretrained for number recognition, which is the key procedure for NMR data extraction. The k nearest neighbor (kNN) method was selected with optimized k (k = 4), which displayed a 100% recognition rate. Subsequently, the performance of SRCV was tested and validated to have high accuracy with a short processing time (11-21 s) for each NMR spectral image. Our spectrum recognizer enables high-throughput 13C and 1H NMR data extraction from abundant spectra in the literature and has the potential to be used for spectral database construction. In addition, the system may be applicable to be developed for data import to computer-assisted structure elucidation systems, which would automate this procedure significantly. SRCV can be accessed in GitHub (https://github.com/WJmodels/SRCV).

[1]  E. Feigenbaum,et al.  Applications of artificial intelligence for chemical inference. I. Number of possible organic compounds. Acyclic structures containing carbon, hydrogen, oxygen, and nitrogen , 1969 .

[2]  Bruce G. Buchanan,et al.  Dendral and Meta-Dendral: Their Applications Dimension , 1978, Artif. Intell..

[3]  Antony J. Williams,et al.  Application of a new expert system for the structure elucidation of natural products from their 1D and 2D NMR data. , 2002, Journal of natural products.

[4]  Christoph Steinbeck,et al.  NMRShiftDB-Constructing a Free Chemical Information System with Open-Source Components , 2003, J. Chem. Inf. Comput. Sci..

[5]  Roberto Therón,et al.  NAPROC-13: a database for the dereplication of natural product mixtures in bioassay-guided protocols , 2007, Bioinform..

[6]  Massimo Bertozzi,et al.  Pedestrian detection by means of far-infrared stereo vision , 2007, Comput. Vis. Image Underst..

[7]  Kesheng Wu,et al.  Optimizing two-pass connected-component labeling algorithms , 2009, Pattern Analysis and Applications.

[8]  Kesheng Wu,et al.  Fast connected-component labeling , 2009, Pattern Recognit..

[9]  Takeaki Uno,et al.  Chemical Structure Elucidation from 13C NMR Chemical Shifts: Efficient Data Processing Using Bipartite Matching and Maximal Clique Algorithms , 2014, J. Chem. Inf. Model..

[10]  Antony J. Williams,et al.  Computer–Based Structure Elucidation from Spectral Data: The Art of Solving Problems , 2015 .

[11]  Davy Sinnaeve,et al.  A General Method for Extracting Individual Coupling Constants from Crowded 1H NMR Spectra , 2015, Angewandte Chemie.

[12]  Jacqueline M. Cole,et al.  ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature , 2016, J. Chem. Inf. Model..

[13]  A. Valencia,et al.  Information Retrieval and Text Mining Technologies for Chemistry. , 2017, Chemical reviews.

[14]  E. Troche-Pesqueira,et al.  Computer-Assisted 3D Structure Elucidation of Natural Products using Residual Dipolar Couplings. , 2017, Angewandte Chemie.

[15]  A. Jha,et al.  A Hydrazine Insertion Route to N'-Alkyl Benzohydrazides by an Unexpected Carbon-Carbon Bond Cleavage. , 2019, Organic letters.

[16]  G. Bitchagno,et al.  Computational methods for NMR and MS for structure elucidation III: More advanced approaches , 2019, Physical Sciences Reviews.

[17]  G. Schneider,et al.  Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. , 2019, Chemical reviews.

[18]  C. Anklin,et al.  Computer-Assisted 3D Structure Elucidation (CASE-3D): the Structural Value of 2JCH in Addition to 3JCH Coupling Constants. , 2020, Angewandte Chemie.