Extracting biological information with computational analysis of Fourier-transform infrared (FTIR) biospectroscopy datasets: current practices to future perspectives.

Applying Fourier-transform infrared (FTIR) spectroscopy (or related technologies such as Raman spectroscopy) to biological questions (defined as biospectroscopy) is relatively novel. Potential fields of application include cytological, histological and microbial studies. This potentially provides a rapid and non-destructive approach to clinical diagnosis. Its increase in application is primarily a consequence of developing instrumentation along with computational techniques. In the coming decades, biospectroscopy is likely to become a common tool in the screening or diagnostic laboratory, or even in the general practitioner's clinic. Despite many advances in the biological application of FTIR spectroscopy, there remain challenges in sample preparation, instrumentation and data handling. We focus on the latter, where we identify in the reviewed literature, the existence of four main study goals: Pattern Finding; Biomarker Identification; Imaging; and, Diagnosis. These can be grouped into two frameworks: Exploratory; and, Diagnostic. Existing techniques in Quality Control, Pre-processing, Feature Extraction, Clustering, and Classification are critically reviewed. An aspect that is often visited is that of method choice. Based on the state-of-art, we claim that in the near future research should be focused on the challenges of dataset standardization; building information systems; development and validation of data analysis tools; and, technology transfer. A diagnostic case study using a real-world dataset is presented as an illustration. Many of the methods presented in this review are Machine Learning and Statistical techniques that are extendable to other forms of computer-based biomedical analysis, including mass spectrometry and magnetic resonance.

[1]  Claudia Beleites,et al.  Assessing and improving the stability of chemometric models in small sample size situations , 2008, Analytical and bioanalytical chemistry.

[2]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Plamen P. Angelov,et al.  Robust classification of low-grade cervical cytology following analysis with ATR-FTIR spectroscopy and subsequent application of self-learning classifier eClass , 2010, Analytical and bioanalytical chemistry.

[4]  Conrad Bessant,et al.  Support vector machine ensembles for breast cancer type prediction from mid-FTIR micro-calcification spectra , 2011 .

[5]  Richard Baumgartner,et al.  Mapping high-dimensional data onto a relative distance plane - an exact method for visualizing and characterizing high-dimensional patterns , 2004, J. Biomed. Informatics.

[6]  Max Diem,et al.  Imaging of colorectal adenocarcinoma using FT-IR microspectroscopy and cluster analysis. , 2004, Biochimica et biophysica acta.

[7]  Francis L Martin,et al.  Discrimination of a transformation phenotype in Syrian golden hamster embryo (SHE) cells using ATR-FTIR spectroscopy. , 2009, Toxicology.

[8]  H. Martens,et al.  Extended multiplicative signal correction and spectral interference subtraction: new preprocessing methods for near infrared spectroscopy. , 1991, Journal of pharmaceutical and biomedical analysis.

[9]  G. Hagberg,et al.  From magnetic resonance spectroscopy to classification of tumors. A review of pattern recognition methods , 1998, NMR in biomedicine.

[10]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[11]  Francis L Martin,et al.  Tracking the cell hierarchy in the human intestine using biochemical signatures derived by mid-infrared microspectroscopy. , 2009, Stem cell research.

[12]  Maria Fernanda Pimentel,et al.  TRANSFERÊNCIA DE CALIBRAÇÃO EM MÉTODOS MULTIVARIADOS , 2007 .

[13]  S. Doglia,et al.  FTIR spectral signatures of mouse antral oocytes: molecular markers of oocyte maturation and developmental competence. , 2011, Biochimica et biophysica acta.

[14]  T. P. Forrest,et al.  Advantages of a hierarchical system of neural-networks for the interpretation of infrared spectra in structure determination , 1997 .

[15]  Francis L Martin,et al.  IR microspectroscopy: potential applications in cervical cancer screening. , 2007, Cancer letters.

[16]  Michel Manfait,et al.  Differential diagnosis of cutaneous carcinomas by infrared spectral micro-imaging combined with pattern recognition. , 2009, The Analyst.

[17]  Witold Pedrycz,et al.  Effective classification using feature selection and fuzzy integration , 2008, Fuzzy Sets Syst..

[18]  Francis L Martin,et al.  Fourier-transform infrared spectroscopy discriminates a spectral signature of endometriosis independent of inter-individual variation. , 2011, The Analyst.

[19]  Paolo Mereghetti,et al.  Embryonic stem cell differentiation studied by FT-IR spectroscopy. , 2008, Biochimica et biophysica acta.

[20]  Ernst Wit,et al.  Identifying Variables Responsible for Clustering in Discriminant Analysis of Data from Infrared Microspectroscopy of a Biological Sample , 2007, J. Comput. Biol..

[21]  J. Kauffman,et al.  Standardization of Raman spectra for transfer of spectral libraries across different instruments. , 2011, The Analyst.

[22]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[23]  Salvador Tortajada,et al.  Incremental Gaussian Discriminant Analysis based on Graybill and Deal weighted combination of estimators for brain tumour diagnosis , 2011, J. Biomed. Informatics.

[24]  Vincent Baeten,et al.  Combination of support vector machines (SVM) and near‐infrared (NIR) imaging spectroscopy for the detection of meat and bone meal (MBM) in compound feeds , 2004 .

[25]  Benjamin Bird,et al.  Infrared micro-spectral imaging: distinction of tissue types in axillary lymph node histology , 2008, BMC clinical pathology.

[26]  D. Naumann FT-INFRARED AND FT-RAMAN SPECTROSCOPY IN BIOMEDICAL RESEARCH , 2001 .

[27]  N. Clarke,et al.  FTIR-based spectroscopic analysis in the identification of clinically aggressive prostate cancer , 2008, British Journal of Cancer.

[28]  Francis L Martin,et al.  Distinguishing cell types or populations based on the computational analysis of their infrared spectra , 2010, Nature Protocols.

[29]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[30]  C Clarke,et al.  An investigation of the RWPE prostate derived family of cell lines using FTIR spectroscopy. , 2010, The Analyst.

[31]  Peter Lasch,et al.  Detection of preclinical scrapie from serum by infrared spectroscopy and chemometrics , 2007, Analytical and bioanalytical chemistry.

[32]  Royston Goodacre,et al.  Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data , 2005, Bioinform..

[33]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[34]  Francis L Martin,et al.  Biospectroscopy to metabolically profile biomolecular structure: a multistage approach linking computational analysis with biomarkers. , 2011, Journal of proteome research.

[35]  Max Diem,et al.  Artificial neural networks as supervised techniques for FT‐IR microspectroscopic imaging , 2006, Journal of chemometrics.

[36]  R. L. Somorjai,et al.  Creating robust, reliable, clinically relevant classifiers from spectroscopic data , 2009, Biophysical Reviews.

[37]  M. Diem,et al.  Fourier transform infrared (FTIR) spectral mapping of the cervical transformation zone, and dysplastic squamous epithelium. , 2004, Gynecologic oncology.

[38]  Plamen P. Angelov,et al.  An approach for fuzzy rule-base adaptation using on-line clustering , 2004, Int. J. Approx. Reason..

[39]  Christoph Krafft,et al.  Classification of malignant gliomas by infrared spectroscopic imaging and linear discriminant analysis , 2007, Analytical and bioanalytical chemistry.

[40]  Ray L. Somorjai,et al.  Direct classification of high-dimensional data in low-dimensional projected feature spaces - Comparison of several classification methodologies , 2007, J. Biomed. Informatics.

[41]  Plamen P. Angelov,et al.  Evolving Fuzzy-Rule-Based Classifiers From Data Streams , 2008, IEEE Transactions on Fuzzy Systems.

[42]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  D. Massart,et al.  Standardisation of near-infrared spectrometric instruments: A review , 1996 .

[44]  Plamen Angelov,et al.  Intelligent interrogation of mid-IR spectroscopy data from exfoliative cervical cytology using self-learning classifier eClass , 2008 .

[45]  Christoph Krafft,et al.  Methodology for fiber-optic Raman mapping and FTIR imaging of metastases in mouse brains , 2007, Analytical and bioanalytical chemistry.

[46]  B. Mizaikoff,et al.  Application of multivariate data-analysis techniques to biomedical diagnostics based on mid-infrared spectroscopy , 2008, Analytical and bioanalytical chemistry.

[47]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[48]  Nihal Simsek Ozek,et al.  Evaluation and discrimination of simvastatin-induced structural alterations in proteins of different rat tissues by FTIR spectroscopy and neural network analysis. , 2010, The Analyst.

[49]  Richard Baumgartner,et al.  Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions , 2003, Bioinform..

[50]  Cyril Petibois,et al.  FT-IR spectral imaging of blood vessels reveals protein secondary structure deviations induced by tumor growth , 2008, Analytical and bioanalytical chemistry.

[51]  Wei Zheng,et al.  In vivo diagnosis of cervical precancer using Raman spectroscopy and genetic algorithm techniques. , 2011, The Analyst.

[52]  M. Diem,et al.  Monitoring the reversible B to A-like transition of DNA in eukaryotic cells using Fourier transform infrared spectroscopy , 2011, Nucleic acids research.

[53]  Isabelle Guyon,et al.  What Size Test Set Gives Good Error Rate Estimates? , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  M. Nadji,et al.  Preservation of biomolecules in breast cancer tissue by a formalin-free histology system , 2008, BMC clinical pathology.

[55]  M. Diem,et al.  A decade of vibrational micro-spectroscopy of human cells and tissue (1994-2004). , 2004, The Analyst.

[56]  Virgilia Macias,et al.  High-resolution Fourier-transform infrared chemical imaging with multiple synchrotron beams , 2011, Nature Methods.

[57]  Wei Zheng,et al.  Spectroscopic diagnosis of laryngeal carcinoma using near-infrared Raman spectroscopy and random recursive partitioning ensemble techniques. , 2009, The Analyst.

[58]  Francis L Martin,et al.  Syrian hamster embryo (SHE) assay (pH 6.7) coupled with infrared spectroscopy and chemometrics towards toxicological assessment. , 2010, The Analyst.

[59]  Jürgen Schmitt,et al.  The NeuroDeveloper®: a tool for modular neural classification of spectroscopic data , 2003 .

[60]  Gerhard J. Mueller,et al.  Imaging of human colon carcinoma thin sections by FT-IR microspectrometry , 1998, Photonics West - Biomedical Optics.

[61]  Francis L Martin,et al.  Segregation of human prostate tissues classified high-risk (UK) versus low-risk (India) for adenocarcinoma using Fourier-transform infrared or Raman microspectroscopy coupled with discriminant analysis , 2011, Analytical and bioanalytical chemistry.

[62]  H Guterman,et al.  Distinction of cervical cancer biopsies by use of infrared microspectroscopy and probabilistic neural networks. , 2005, Applied optics.

[63]  Andrew J Berger,et al.  Method for automated background subtraction from Raman spectra containing known contaminants. , 2009, The Analyst.

[64]  Ray L. Somorjai,et al.  A Pattern Recognition Application Framework for Biomedical Datasets , 2007 .

[65]  Ganesh D. Sockalingum,et al.  Pre‐processing in biochemometrics: correction for path‐length and temperature effects of water in FTIR bio‐spectroscopy by EMSC , 2006 .

[66]  Bayden R. Wood,et al.  A three-dimensional multivariate image processing technique for the analysis of FTIR spectroscopic images of multiple tissue sections , 2006, BMC Medical Imaging.

[67]  Mia K. Markey,et al.  A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples , 2006, J. Biomed. Informatics.

[68]  Mark A. Pitt,et al.  FTIR Microspectroscopy Coupled with Two-Class Discrimination Segregates Markers Responsible for Inter- and Intra-Category Variance in Exfoliative Cervical Cytology , 2008, Biomarker insights.

[69]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[70]  Francis L Martin,et al.  Monitoring cell cycle distributions in MCF-7 cells using near-field photothermal microspectroscopy. , 2005, Biophysical journal.

[71]  Mads S. Bergholt,et al.  Raman endoscopy for in vivo differentiation between benign and malignant ulcers in the stomach. , 2010, The Analyst.

[72]  Y. Hwu,et al.  Detection of collagens in brain tumors based on FTIR imaging and chemometrics , 2011, Analytical and bioanalytical chemistry.

[73]  Francis L Martin,et al.  Infrared spectroscopy with multivariate analysis potentially facilitates the segregation of different types of prostate cell. , 2006, Biophysical journal.

[74]  Keith R Bambery,et al.  Resonant Mie scattering (RMieS) correction applied to FTIR images of biological tissue samples. , 2012, The Analyst.

[75]  L. Mariey,et al.  Discrimination, classification, identification of microorganisms using FTIR spectroscopy and chemometrics , 2001 .

[76]  J. Dwyer,et al.  Biomolecular profiling of metastatic prostate cancer cells in bone marrow tissue using FTIR microspectroscopy: a pilot study , 2007, Analytical and bioanalytical chemistry.

[77]  H. Susi,et al.  Examination of the secondary structure of proteins by deconvolved FTIR spectra , 1986, Biopolymers.

[78]  Rasmus Bro,et al.  Some common misunderstandings in chemometrics , 2010 .

[79]  E. K. Kemsley,et al.  Discriminant analysis of high-dimensional data: a comparison of principal components analysis and partial least squares data reduction methods , 1996 .

[80]  D. Coomans,et al.  Recent developments in discriminant analysis on high dimensional spectral data , 1996 .

[81]  Moongu Jeon,et al.  Identification of signatures in biomedical spectra using domain knowledge , 2005, Artif. Intell. Medicine.

[82]  Felix von Stetten,et al.  Reliable and Rapid Identification of Listeria monocytogenes and Listeria Species by Artificial Neural Network-Based Fourier Transform Infrared Spectroscopy , 2006, Applied and Environmental Microbiology.

[83]  Gabriele Schackert,et al.  Classification of human gliomas by infrared imaging spectroscopy and chemometric image processing , 2005 .

[84]  Harald Martens,et al.  RMieS‐EMSC correction for infrared spectra of biological cells: Extension using full Mie theory and GPU computing , 2010, Journal of biophotonics.

[85]  F. Martin,et al.  Derivation by infrared spectroscopy with multivariate analysis of bimodal contaminant-induced dose-response effects in MCF-7 cells. , 2011, Environmental science & technology.

[86]  Tao Chen,et al.  The impact of temperature variations on spectroscopic calibration modelling: a comparative study , 2007 .

[87]  Francis L Martin,et al.  Evidence for a stem-cell lineage in corneal squamous cell carcinoma using synchrotron-based Fourier-transform infrared microspectroscopy and multivariate analysis. , 2010, The Analyst.

[88]  Örjan Smedby,et al.  Three-dimensional drip infusion CT cholangiography in patients with suspected obstructive biliary disease: a retrospective analysis of feasibility and adverse reaction to contrast material. , 2006, BMC Medical Imaging.

[89]  Kevin C. Jones,et al.  Binary mixture effects by PBDE congeners (47, 153, 183, or 209) and PCB congeners (126 or 153) in MCF-7 cells: biochemical alterations assessed by IR spectroscopy and multivariate analysis. , 2010, Environmental science & technology.

[90]  Tom Fearn,et al.  On orthogonal signal correction , 2000 .

[91]  M. Diem,et al.  Infrared spectroscopy of human cells and tissue. VIII. Strategies for analysis of infrared tissue mapping data and applications to liver tissue. , 2000, Biopolymers.