Machine Learning of Raman Spectroscopy Data for Classifying Cancers: A Review of the Recent Literature

Raman Spectroscopy has long been anticipated to augment clinical decision making, such as classifying oncological samples. Unfortunately, the complexity of Raman data has thus far inhibited their routine use in clinical settings. Traditional machine learning models have been used to help exploit this information, but recent advances in deep learning have the potential to improve the field. However, there are a number of potential pitfalls with both traditional and deep learning models. We conduct a literature review to ascertain the recent machine learning methods used to classify cancers using Raman spectral data. We find that while deep learning models are popular, and ostensibly outperform traditional learning models, there are many methodological considerations which may be leading to an over-estimation of performance; primarily, small sample sizes which compound sub-optimal choices regarding sampling and validation strategies. Amongst several recommendations is a call to collate large benchmark Raman datasets, similar to those that have helped transform digital pathology, which researchers can use to develop and refine deep learning models.

[1]  T. Bocklitz,et al.  Chemometric analysis in Raman spectroscopy from experimental design to machine learning–based modeling , 2021, Nature Protocols.

[2]  Jianbin Luo,et al.  Highly accurate diagnosis of lung adenocarcinoma and squamous cell carcinoma tissues by deep learning. , 2021, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[3]  Wei Wang,et al.  Screening ovarian cancers with Raman spectroscopy of blood plasma coupled with machine learning data processing. , 2021, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[4]  Liang Zhao,et al.  Raman spectroscopy and machine learning for the classification of breast cancers. , 2021, Spectrochimica Acta Part A - Molecular and Biomolecular Spectroscopy.

[5]  X. Lv,et al.  Rapid diagnosis of lung cancer and glioma based on serum Raman spectroscopy combined with deep learning , 2021, Journal of Raman Spectroscopy.

[6]  S. Moccia,et al.  Raman Spectroscopy and Machine Learning for IDH Genotyping of Unprocessed Glioma Biopsies , 2021, Cancers.

[7]  Wei Zheng,et al.  Deep Learning-Guided Fiberoptic Raman Spectroscopy Enables Real-Time In Vivo Diagnosis and Assessment of Nasopharyngeal Carcinoma and Post-treatment Efficacy during Endoscopy. , 2021, Analytical chemistry.

[8]  E. Mayo-Wilson,et al.  The PRISMA 2020 statement: an updated guideline for reporting systematic reviews , 2021, BMJ.

[9]  YanJiao Zhang,et al.  Fast discrimination of tumor and blood cells by label-free surface-enhanced Raman scattering spectra and deep learning , 2021 .

[10]  Jianhua Yin,et al.  Classifying breast cancer tissue by Raman spectroscopy with one-dimensional convolutional neural network. , 2021, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[11]  S. Moccia,et al.  Glioma biopsies Classification Using Raman Spectroscopy and Machine Learning Models on Fresh Tissue Samples , 2021, Cancers.

[12]  Jian Ye,et al.  Raman optical identification of renal cell carcinoma via machine learning. , 2021, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[13]  Christian Etmann,et al.  Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans , 2020 .

[14]  Xing Wu,et al.  Rapid and accurate identification of colon cancer by Raman spectroscopy coupled with convolutional neural networks , 2021, Japanese Journal of Applied Physics.

[15]  F. Pavone,et al.  Supervised learning methods for the recognition of melanoma cell lines through the analysis of their Raman spectra , 2020, Journal of biophotonics.

[16]  H. Inoue,et al.  Highly accurate colorectal cancer prediction model based on Raman spectroscopy using patient serum , 2020, World journal of gastrointestinal oncology.

[17]  Samuel Kadoury,et al.  Data consistency and classification model transferability across biomedical Raman spectroscopy systems , 2020 .

[18]  P. V. van Diest,et al.  Prognostic value of histopathological DCIS features in a large-scale international interrater reliability study , 2020, Breast Cancer Research and Treatment.

[19]  Michelle A. Brusatori,et al.  Accurate identification of breast cancer margins in microenvironments of ex-vivo basal and luminal breast cancer tissues using Raman spectroscopy. , 2020, Prostaglandins & other lipid mediators.

[20]  Oleg O. Myakinin,et al.  Comparison testing of machine learning algorithms separability on Raman spectra of skin cancer , 2020 .

[21]  M. Dong,et al.  Analysis and classification of oral tongue squamous cell carcinoma based on Raman spectroscopy and convolutional neural networks , 2020, Journal of Modern Optics.

[22]  J. Ioannidis,et al.  Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies , 2020, BMJ.

[23]  Shuxia Guo,et al.  Deep learning a boon for biophotonics? , 2020, Journal of biophotonics.

[24]  S. Warfield,et al.  Deep learning with noisy labels: exploring techniques and remedies in medical image analysis , 2019, Medical Image Anal..

[25]  I. Barman,et al.  Emerging trends in biomedical imaging and disease diagnosis using Raman spectroscopy , 2020 .

[26]  Mingxin Yu,et al.  Diverse Region-Based CNN for Tongue Squamous Cell Carcinoma Classification With Raman Spectroscopy , 2020, IEEE Access.

[27]  Herman L. Offerhaus,et al.  Classifying Raman spectra of extracellular vesicles based on convolutional neural networks for prostate cancer detection , 2019, Journal of Raman Spectroscopy.

[28]  Shih-Lin Wu,et al.  Raman Spectroscopy Analysis for Optical Diagnosis of Oral Cancer Detection , 2019, Journal of clinical medicine.

[29]  A. Madabhushi,et al.  Artificial intelligence in digital pathology — new tools for diagnosis and precision oncology , 2019, Nature Reviews Clinical Oncology.

[30]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[31]  Tao Zhang,et al.  Deep convolutional neural networks for tongue squamous cell carcinoma classification using Raman spectroscopy. , 2019, Photodiagnosis and photodynamic therapy.

[32]  Wei Zheng,et al.  Fiber-Optic Raman Spectroscopy with Nature-Inspired Genetic Algorithms Enhances Real-Time in Vivo Detection and Diagnosis of Nasopharyngeal Carcinoma. , 2019, Analytical chemistry.

[33]  N. Hawkes Cancer survival data emphasise importance of early diagnosis , 2019, British Medical Journal.

[34]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[35]  Rayid Ghani,et al.  Machine learning and AI research for Patient Benefit: 20 Critical Questions on Transparency, Replicability, Ethics and Effectiveness , 2018, ArXiv.

[36]  T. B. Bakker Schut,et al.  Improving clinical diagnosis of early-stage cutaneous melanoma based on Raman spectroscopy , 2018, British Journal of Cancer.

[37]  Marcus A. Badgeley,et al.  Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study , 2018, PLoS medicine.

[38]  S. Möller,et al.  Laryngeal precursor lesions: Interrater and intrarater reliability of histopathological assessment , 2018, The Laryngoscope.

[39]  Eduardo Valle,et al.  Data Augmentation for Skin Lesion Analysis , 2018, OR 2.0/CARE/CLIP/ISIC@MICCAI.

[40]  J. Hecht,et al.  Inter-pathologist and pathology report agreement for ovarian tumor characteristics in the Nurses' Health Studies. , 2018, Gynecologic oncology.

[41]  D Bajusz,et al.  Modelling methods and cross-validation variants in QSAR: a multi-level analysis$ , 2018, SAR and QSAR in environmental research.

[42]  F. Saad,et al.  Mesoscopic characterization of prostate cancer using Raman spectroscopy: potential for diagnostics and therapeutics , 2018, BJU international.

[43]  F. Martin,et al.  Phenotyping Metastatic Brain Tumors Applying Spectrochemical Analyses: Segregation of Different Cancer Types , 2018, Analytical Letters.

[44]  Sanjeeva Srivastava,et al.  An early investigative serum Raman spectroscopy study of meningioma. , 2018, The Analyst.

[45]  Julian Moger,et al.  Clinical applications of infrared and Raman spectroscopy: state of play and future challenges. , 2018, The Analyst.

[46]  S. Ganesan,et al.  Near-infrared Raman spectroscopy for estimating biochemical changes associated with different pathological conditions of cervix. , 2018, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[47]  Danail Stoyanov,et al.  OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis , 2018, Lecture Notes in Computer Science.

[48]  T. B. Bakker Schut,et al.  Raman spectroscopy for cancer detection and cancer surgery guidance: translation to the clinics. , 2017, The Analyst.

[49]  Jürgen Popp,et al.  Common mistakes in cross-validating classification models , 2017 .

[50]  Jürgen Popp,et al.  Towards an improvement of model transferability for Raman spectroscopy in biological applications , 2017 .

[51]  T. B. Bakker Schut,et al.  Raman Spectroscopic Characterization of Melanoma and Benign Melanocytic Lesions Suspected of Melanoma Using High-Wavenumber Raman Spectroscopy. , 2016, Analytical chemistry.

[52]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[53]  Marc Thilo Figge,et al.  Automated Classification of Circulating Tumor Cells and the Impact of Interobsever Variability on Classifier Training and Performance , 2015, Journal of immunology research.

[54]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.

[55]  Benoît Frénay,et al.  A comprehensive introduction to label noise , 2014, ESANN.

[56]  Jürgen Popp,et al.  How to pre-process Raman spectra for reliable and stable models? , 2011, Analytica chimica acta.

[57]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[58]  Claudia Beleites,et al.  Variance reduction in estimating classification error using sparse datasets , 2005 .

[59]  Annette M. Molinaro,et al.  Prediction error estimation: a comparison of resampling methods , 2005, Bioinform..

[60]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..