Application of random forest based approaches to surface-enhanced Raman scattering data

Surface-enhanced Raman scattering (SERS) is a valuable analytical technique for the analysis of biological samples. However, due to the nature of SERS it is often challenging to exploit the generated data to obtain the desired information when no reporter or label molecules are used. Here, the suitability of random forest based approaches is evaluated using SERS data generated by a simulation framework that is also presented. More specifically, it is demonstrated that important SERS signals can be identified, the relevance of predefined spectral groups can be evaluated, and the relations of different SERS signals can be analyzed. It is shown that for the selection of important SERS signals Boruta and surrogate minimal depth (SMD) and for the analysis of spectral groups the competing method Learner of Functional Enrichment (LeFE) should be applied. In general, this investigation demonstrates that the combination of random forest approaches and SERS data is very promising for sophisticated analysis of complex biological samples.

[1]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[2]  Roland Eils,et al.  Complex heatmaps reveal patterns and correlations in multidimensional genomic data , 2016, Bioinform..

[3]  Kevin Dhaliwal,et al.  Surface-enhanced Raman scattering in cancer detection and imaging. , 2013, Trends in biotechnology.

[4]  Royston Goodacre,et al.  Surface-enhanced Raman scattering for the rapid discrimination of bacteria. , 2006, Faraday discussions.

[5]  Stephan Seifert,et al.  Identification of aqueous pollen extracts using surface enhanced Raman scattering (SERS) and pattern recognition methods , 2016, Journal of biophotonics.

[6]  Li Wang,et al.  Nuclear targeted nanoprobe for single living cell detection by surface-enhanced Raman scattering. , 2009, Bioconjugate chemistry.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  R. Goodacre,et al.  Simultaneous multiplexed quantification of caffeine and its major metabolites theobromine and paraxanthine using surface-enhanced Raman scattering , 2015, Analytical and Bioanalytical Chemistry.

[9]  J. Masson,et al.  Machine-Learning-Driven Surface-Enhanced Raman Scattering Optophysiology Reveals Multiplexed Metabolite Gradients Near Cells. , 2019, ACS nano.

[10]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[11]  Hongyu Zhao,et al.  Pathway analysis using random forests classification and regression , 2006, Bioinform..

[12]  Thomas Huser,et al.  Intracellular pH sensors based on surface-enhanced raman scattering. , 2004, Analytical chemistry.

[13]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[14]  Janina Kneipp,et al.  Surface-enhanced Raman scattering hybrid nanoprobe multiplexing and imaging in biological systems. , 2010, ACS nano.

[15]  R. Dasari,et al.  Single Molecule Detection Using Surface-Enhanced Raman Scattering (SERS) , 1997 .

[16]  M. Klempner,et al.  Characterization of the surface enhanced raman scattering (SERS) of bacteria. , 2005, The journal of physical chemistry. B.

[17]  James D. Malley,et al.  r2VIM: A new variable selection method for random forests in genome-wide association studies , 2016, BioData Mining.

[18]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[19]  Steven R. Emory,et al.  Probing Single Molecules and Single Nanoparticles by Surface-Enhanced Raman Scattering , 1997, Science.

[20]  Royston Goodacre,et al.  Quantitative Analysis of the Banned Food Dye Sudan-1 Using Surface Enhanced Raman Scattering with Multivariate Chemometrics† , 2010 .

[21]  Frauke Degenhardt,et al.  Evaluation of variable selection methods for random forests and omics data sets , 2017, Briefings Bioinform..

[22]  Michael S. Feld,et al.  Surface-Enhanced Raman Spectroscopy in Single Living Cells Using Gold Nanoparticles , 2002 .

[23]  Stephan Seifert,et al.  Surrogate minimal depth as an importance measure for variables in random forests , 2019, Bioinform..

[24]  K. Kneipp,et al.  Surface-enhanced Raman scattering in local optical fields of silver and gold nanoaggregates-from single-molecule Raman spectroscopy to ultrasensitive probing in live cells. , 2006, Accounts of chemical research.

[25]  M. Porter,et al.  Low-level detection of viral pathogens by a surface-enhanced Raman scattering based immunoassay. , 2005, Analytical chemistry.

[26]  John N Weinstein,et al.  The LeFE algorithm: embracing the complexity of gene expression in the interpretation of microarray data , 2007, Genome Biology.

[27]  W. R. Premasiri,et al.  Surface-enhanced Raman scattering of whole human blood, blood plasma, and red blood cells: cellular processes and bioanalytical sensing. , 2012, The journal of physical chemistry. B.

[28]  Yuanyuan Su,et al.  Setting Up a Surface-Enhanced Raman Scattering Database for Artificial-Intelligence-Based Label-Free Discrimination of Tumor Suppressor Genes. , 2018, Analytical chemistry.

[29]  J. Kneipp Interrogating Cells, Tissues, and Live Animals with New Generations of Surface-Enhanced Raman Scattering Probes and Labels. , 2017, ACS nano.

[30]  P. Guttmann,et al.  Optical Nanosensing of Lipid Accumulation due to Enzyme Inhibition in Live Cells. , 2019, ACS nano.

[31]  Udaya B. Kogalur,et al.  High-Dimensional Variable Selection for Survival Data , 2010 .

[32]  M. Akritas,et al.  NonpModelCheck: An R Package for Nonparametric Lack-of-Fit Testing and Variable Selection , 2017 .

[33]  Anne-Laure Boulesteix,et al.  A computationally fast variable importance test for random forests for high-dimensional data , 2015, Adv. Data Anal. Classif..

[34]  Stephan Seifert,et al.  Integrating biological knowledge and gene expression data using pathway-guided random forests: a benchmarking study , 2020, Bioinform..

[35]  Tingting Xu,et al.  Surface-enhanced Raman scattering-based sensing in vitro: facile and label-free detection of apoptotic cells at the single-cell level. , 2013, Analytical chemistry.