Machine learning for LC–MS medicinal plants identification

Abstract Herbal medicines are vigorously marketed, but poorly regulated. Analysis methodology for this field is still forming. One particular analytical task is confirmation of plant species identity for medicinal plants used as ingredients. In this work, machine learning approach has been implemented for LC–MS plant species identification. Samples for 36 plant species have been analyzed. Peak data ( m / z , abundance) from respective samples have been used for development of classification algorithms. Namely, logistic regression (LR), support vector machine (SVM) and random forest (RF) techniques were used. For most of used machine learning algorithms, classification accuracy of 95 % higher were obtained on cross-validation dataset. Now, massive training datasets are needed for full-scale application of this approach.

[1]  Merike Vaher,et al.  Fluorescence, electrophoretic and chromatographic fingerprints of herbal medicines and their comparative chemometric analysis. , 2015, Talanta.

[2]  Bo-Li Zhang,et al.  The multi-targets integrated fingerprinting for screening anti-diabetic compounds from a Chinese medicine Jinqi Jiangtang Tablet. , 2015, Journal of ethnopharmacology.

[3]  Ming Yuan,et al.  Contribution evaluation of the floral parts to orientin and vitexin concentrations in the flowers of Trollius chinensis. , 2013, Chinese journal of natural medicines.

[4]  Y. Vander Heyden,et al.  Similarity analyses of chromatographic fingerprints as tools for identification and quality control of green tea. , 2012, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[5]  Ming Yuan,et al.  Contribution evaluation of the floral parts to orientin and vitexin concentrations in the flowers of Trollius chinensis: Contribution evaluation of the floral parts to orientin and vitexin concentrations in the flowers of Trollius chinensis , 2014 .

[6]  Geoffrey A. Cordell,et al.  Phytochemistry and traditional medicine—The revolution continues , 2014 .

[7]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[8]  Andrea Porzel,et al.  Unraveling the active hypoglycemic agent trigonelline in Balanites aegyptiaca date fruit using metabolite fingerprinting by NMR. , 2015, Journal of pharmaceutical and biomedical analysis.

[9]  Xinmiao Liang,et al.  The potential of metabolic fingerprinting as a tool for the modernisation of TCM preparations. , 2012, Journal of ethnopharmacology.

[10]  J. Riedl,et al.  Review of validation and reporting of non-targeted fingerprinting approaches for food authentication. , 2015, Analytica chimica acta.

[11]  Maxleene Sandasi,et al.  A chemotaxonomic assessment of four indigenous South African Lippia species using GC–MS and vibrational spectroscopy of the essential oils , 2013 .

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Y. Vander Heyden,et al.  Similarity analyses of chromatographic herbal fingerprints: a review. , 2013, Analytica chimica acta.

[14]  Md. Mokhlesur Rahman,et al.  Techniques for extraction of bioactive compounds from plant materials: A review , 2013 .

[15]  Ranjit Roy Chaudhury,et al.  Herbal remedies and traditional medicines in reproductive health care practices and their clinical evaluation , 2015 .

[16]  Marco Flôres Ferrão,et al.  Development of methodology for identification the nature of the polyphenolic extracts by FTIR associated with multivariate analysis. , 2016, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[17]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[18]  Peng Gao,et al.  A novel strategy to evaluate the quality of traditional Chinese medicine based on the correlation analysis of chemical fingerprint and biological effect. , 2013, Journal of pharmaceutical and biomedical analysis.

[19]  Eiichiro Fukusaki,et al.  Quality evaluation of Angelica acutiloba Kitagawa roots by 1H NMR-based metabolic fingerprinting. , 2008, Journal of pharmaceutical and biomedical analysis.

[20]  Yi-Zeng Liang,et al.  Quality evaluation of fingerprints of herbal medicine with chromatographic data , 2004 .

[21]  Jianghao Sun,et al.  Use of flow injection mass spectrometric fingerprinting and chemometrics for differentiation of three black cohosh species , 2015 .

[22]  Chiara Cordero,et al.  Potential of the reversed-inject differential flow modulator for comprehensive two-dimensional gas chromatography in the quantitative profiling and fingerprinting of essential oils of different complexity. , 2015, Journal of chromatography. A.

[23]  Kelvin Chan,et al.  Differentiation of Pueraria lobata and Pueraria thomsonii using partial least square discriminant analysis (PLS-DA). , 2013, Journal of pharmaceutical and biomedical analysis.

[24]  Y. Vander Heyden,et al.  Chromatographic separation techniques and data handling methods for herbal fingerprints: a review. , 2011, Analytica chimica acta.

[25]  Wencai Ye,et al.  Multi-ingredients determination and fingerprint analysis of leaves from Ilex latifolia using ultra-performance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry. , 2013, Journal of pharmaceutical and biomedical analysis.

[26]  Huan Cheng,et al.  Geographical origin identification of propolis using GC–MS and electronic nose combined with principal component analysis , 2013 .

[27]  Chaoyin Chen,et al.  Fourier Transform Infrared (FT-IR) Spectroscopy for discrimination of Rhizoma gastrodiae (Tianma) from different producing areas , 2013 .

[28]  Y Vander Heyden,et al.  Discrimination and classification techniques applied on Mallotus and Phyllanthus high performance liquid chromatography fingerprints. , 2015, Analytica chimica acta.

[29]  Jian-Bo Wan,et al.  Chemical differentiation of Da-Cheng-Qi-Tang, a Chinese medicine formula, prepared by traditional and modern decoction methods using UPLC/Q-TOFMS-based metabolomics approach. , 2013, Journal of Pharmaceutical and Biomedical Analysis.

[30]  Wei Liu,et al.  Fingerprinting profile of polysaccharides from Lycium barbarum using multiplex approaches and chemometrics. , 2015, International journal of biological macromolecules.

[31]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[32]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..