Chemometrics and qualitative analysis have a vibrant relationship

In analytical chemistry, qualitative analysis is often associated with compound identification, while chemometrics offers a wide spectrum of data-analysis methods that extend the application of qualitative analysis beyond compound identification. All chemical analyses that have a qualitative goal can or should be considered as qualitative chemical analysis. Thanks to chemometrics, both qualitative and quantitative data can be included in qualitative analysis and modeled towards a qualitative analysis goal. We provide an extensive overview on the vibrant relationship between chemometrics and qualitative analysis. It includes a description of chemometric methods, their real-life applications in qualitative analysis, challenges and possible solutions. Undoubtedly, the role of chemometrics will become pivotal in the future when more possibilities of qualitative analysis will be explored and new chemometric approaches will be developed for high-dimensional data.

[1]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[2]  R. Brereton,et al.  Partial least squares discriminant analysis: taking the magic away , 2014 .

[3]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[4]  Yvan Vander Heyden,et al.  Efficient recovery of electrophoretic profiles of nucleoside metabolites from urine samples by multivariate curve resolution , 2009, Electrophoresis.

[5]  David S. Wishart,et al.  Current Progress in computational metabolomics , 2007, Briefings Bioinform..

[6]  Fan Gong,et al.  Application of dissimilarity indices, principal coordinates analysis, and rank tests to peak tables in metabolomics of the gas chromatography/mass spectrometry of human sweat. , 2007, Analytical chemistry.

[7]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[8]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[9]  Boris L. Milman,et al.  Chemical Identification and its Quality Assurance , 2011 .

[10]  M Daszykowski,et al.  Dealing with missing values and outliers in principal component analysis. , 2007, Talanta.

[11]  Riccardo Leardi,et al.  A class‐modelling technique based on potential functions , 1991 .

[12]  R. Bro,et al.  Centering and scaling in component analysis , 2003 .

[13]  Márta Ladányi,et al.  Morphological-, chemical- and RAPD-PCR evaluation of eight different Ocimum basilicum L. gene bank accessions , 2014 .

[14]  Ralf Tautenhahn,et al.  A view from above: cloud plots to visualize global metabolomic data. , 2013, Analytical chemistry.

[15]  William Stafford Noble,et al.  How does multiple testing correction work? , 2009, Nature Biotechnology.

[16]  Paolo Massimo Buscema,et al.  Similarity Coefficients for Binary Chemoinformatics Data: Overview and Extended Comparison Using Simulated and Real Data Sets , 2012, J. Chem. Inf. Model..

[17]  K. Héberger,et al.  Supervised pattern recognition in food analysis. , 2007, Journal of chromatography. A.

[18]  Dirk W Lachenmeier,et al.  Determination of rice type by 1H NMR spectroscopy in combination with different chemometric tools , 2014 .

[19]  Age K Smilde,et al.  Gender-dependent associations of metabolite profiles and body fat distribution in a healthy population with central obesity: towards metabolomics diagnostics. , 2012, Omics : a journal of integrative biology.

[20]  Yan Lin,et al.  Missing value imputation in high-dimensional phenomic data: imputable or not, and how? , 2014, BMC Bioinformatics.

[21]  R. Brereton One‐class classifiers , 2011 .

[22]  Riccardo Leardi,et al.  Application of genetic algorithm–PLS for feature selection in spectral data sets , 2000 .

[23]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[24]  K. Kaski,et al.  1H NMR metabonomics approach to the disease continuum of diabetic complications and premature death , 2008, Molecular systems biology.

[25]  Pawel Michalak,et al.  EvoCor: a platform for predicting functionally related genes using phylogenetic and expression profiles , 2014, Nucleic Acids Res..

[26]  Yi-Zeng Liang,et al.  Variable selection for discriminating herbal medicines with chromatographic fingerprints. , 2006, Analytica chimica acta.

[27]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[28]  J. J. Jansen,et al.  Stress-induced DNA methylation changes and their heritability in asexual dandelions. , 2010, The New phytologist.

[29]  Sirish L. Shah,et al.  Analysis of metabolomic data using support vector machines. , 2008, Analytical chemistry.

[30]  R. Bro,et al.  PARAFAC and missing values , 2005 .

[31]  Age K. Smilde,et al.  Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies , 2011, Metabolomics.

[32]  Todd W. Lane,et al.  PARAFAC modeling of three-way hyperspectral images: Endogenous fluorophores as health biomarkers in aquatic species , 2011 .

[33]  R Bro,et al.  Cross-validation of component models: A critical look at current methods , 2008, Analytical and bioanalytical chemistry.

[34]  V. L. Filardi,et al.  Classification of food vegetable oils by fluorimetry and artificial neural networks , 2015 .

[35]  Beata Walczak,et al.  Concept of (dis)similarity in data analysis , 2012 .

[36]  C. Ruckebusch,et al.  Multivariate curve resolution: a review of advanced and tailored applications and challenges. , 2013, Analytica chimica acta.

[37]  K. Jajuga,et al.  On The General Distance Measure , 2003 .

[38]  D. Helsel More than obvious: better methods for interpreting nondetect data. , 2005, Environmental science & technology.

[39]  L. C. Cole,et al.  The Measurement of Interspecific Associaton , 1949 .

[40]  Zengyou He,et al.  Stable Feature Selection for Biomarker Discovery , 2010, Comput. Biol. Chem..

[41]  Tahir Mehmood,et al.  A review of variable selection methods in Partial Least Squares Regression , 2012 .

[42]  Michal Daszykowski,et al.  From projection pursuit to other unsupervised chemometric techniques , 2007 .

[43]  Jaime González-Álvarez,et al.  A chemometric approach to characterization of ionic liquids for gas chromatography , 2014, Analytical and Bioanalytical Chemistry.

[44]  Jon Atli Benediktsson,et al.  Multiple Classifier Systems in Remote Sensing: From Basics to Recent Developments , 2007, MCS.

[45]  E. Finnegan,et al.  Genetic and DNA Methylation Changes in Cotton (Gossypium) Genotypes and Tissues , 2014, PloS one.

[46]  C. Tappert,et al.  A Survey of Binary Similarity and Distance Measures , 2010 .

[47]  M. Khaledi,et al.  Characterization and classification of pseudo-stationary phases in micellar electrokinetic chromatography using chemometric methods. , 2014, Analytical chemistry.

[48]  Stijn van Dongen,et al.  Metric distances derived from cosine similarity and Pearson and Spearman correlations , 2012, ArXiv.

[49]  Lutgarde M. C. Buydens,et al.  Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC) , 2014 .

[50]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[51]  Yajun Yi,et al.  Molecular Alterations in Primary Prostate Cancer after Androgen Ablation Therapy , 2005, Clinical Cancer Research.

[52]  Jasper Engel,et al.  Confirmation of brand identity of a Trappist beer by mid-infrared spectroscopy coupled with multivariate data analysis. , 2012, Talanta.

[53]  Royston Goodacre,et al.  A genetic algorithm-Bayesian network approach for the analysis of metabolomics and spectroscopic data: application to the rapid identification of Bacillus spores and classification of Bacillus species , 2011, BMC Bioinformatics.

[54]  T. Næs,et al.  From dummy regression to prior probabilities in PLS‐DA , 2007 .

[55]  A. R. de Leon,et al.  A generalized Mahalanobis distance for mixed data , 2005 .

[56]  Maciej Haranczyk,et al.  Comparison of Similarity Coefficients for Clustering and Compound Selection , 2008, J. Chem. Inf. Model..

[57]  R. Todeschini,et al.  Multivariate Classification for Qualitative Analysis , 2009 .

[58]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[59]  Timothy M. D. Ebbels,et al.  The evolution of partial least squares models and related chemometric approaches in metabonomics and metabolic phenotyping , 2010 .

[60]  L. A. Currie,et al.  Nomenclature in evaluation of analytical methods including detection and quantification capabilities (IUPAC Recommendations 1995) , 1995 .

[61]  John C. Gower,et al.  Nonmetric Linear Biplots , 1999 .

[62]  Age K. Smilde,et al.  Maximum likelihood scaling (MALS) , 2006 .

[63]  Stefan Tsakovski,et al.  Ecotoxicity and chemical sediment data classification by the use of self-organising maps. , 2009, Analytica chimica acta.

[64]  B. Hammock,et al.  Mass spectrometry-based metabolomics. , 2007, Mass spectrometry reviews.

[65]  Beata Walczak,et al.  Dissimilarity partial least squares applied to non-linear modeling problems , 2012 .

[66]  Peter Filzmoser,et al.  Review of sparse methods in regression and classification with application to chemometrics , 2012 .

[67]  Roman Kaliszan,et al.  Altered levels of nucleoside metabolite profiles in urogenital tract cancer measured by capillary electrophoresis. , 2010, Journal of pharmaceutical and biomedical analysis.

[68]  A. Smilde,et al.  Assessing the statistical validity of proteomics based biomarkers. , 2007, Analytica chimica acta.

[69]  Sean F. Brady,et al.  Chemical-biogeographic survey of secondary metabolism in soil , 2014, Proceedings of the National Academy of Sciences.

[70]  Yan Zhou,et al.  A novel approach to rapidly explore analytical markers for quality control of Radix Salviae Miltiorrhizae extract granules by robust principal component analysis with ultra-high performance liquid chromatography-ultraviolet-quadrupole time-of-flight mass spectrometry. , 2010, Journal of pharmaceutical and biomedical analysis.

[71]  Age K. Smilde,et al.  Between Metabolite Relationships: an essential aspect of metabolic change , 2011, Metabolomics.

[72]  Peter D. Wentzell,et al.  Exploratory data analysis with noisy measurements , 2012 .

[73]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[74]  Kay Obendorf,et al.  Identifications of household's spores using mid infrared spectroscopy. , 2014, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[75]  A. Smilde,et al.  A lipidomic analysis approach to evaluate the response to cholesterol-lowering food intake , 2011, Metabolomics.

[76]  Jagath C. Rajapakse,et al.  One-Versus-One and One-Versus-All Multiclass SVM-RFE for Gene Selection in Cancer Classification , 2007, EvoBIO.

[77]  Johanna Smeyers-Verbeke,et al.  Handbook of Chemometrics and Qualimetrics: Part A , 1997 .

[78]  Charles Bouveyron,et al.  Probabilistic model‐based discriminant analysis and clustering methods in chemometrics , 2013 .

[79]  Marek Walesiak,et al.  Distance Measure for Ordinal Data , 1999 .

[80]  D. Rutledge,et al.  Evolving window zone selection method followed by independent component analysis as useful chemometric tools to discriminate between grapefruit juice, orange juice and blends. , 2007, Analytica chimica acta.

[81]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[82]  Gerard Downey,et al.  Tutorial review. Qualitative analysis in the near-infrared region , 1994 .

[83]  Yair Lotan,et al.  Statistical consideration for clinical biomarker research in bladder cancer. , 2010, Urologic oncology.

[84]  V. Moskvina,et al.  On multiple‐testing correction in genome‐wide association studies , 2008, Genetic epidemiology.

[85]  R. Bro,et al.  A classification tool for N-way array based on SIMCA methodology , 2011 .

[86]  M. P. Gómez-Carracedo,et al.  A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets , 2014 .

[87]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[88]  Theodore Alexandrov,et al.  Segmentation of confocal Raman microspectroscopic imaging data using edge-preserving denoising and clustering. , 2013, Analytical chemistry.

[89]  Royston Goodacre,et al.  A comparison of different chemometrics approaches for the robust classification of electronic nose data , 2014, Analytical and Bioanalytical Chemistry.

[90]  Jan Riegert,et al.  Ecological Structure of Recent and Last Glacial Mammalian Faunas in Northern Eurasia: The Case of Altai-Sayan Refugium , 2014, PloS one.

[91]  Federico Marini,et al.  Local classification: Locally weighted-partial least squares-discriminant analysis (LW-PLS-DA). , 2014, Analytica chimica acta.

[92]  Age K. Smilde,et al.  Reflections on univariate and multivariate analysis of metabolomics data , 2013, Metabolomics.

[93]  Hyungwon Choi,et al.  False discovery rates and related statistical concepts in mass spectrometry-based proteomics. , 2008, Journal of proteome research.

[94]  Lutgarde M. C. Buydens,et al.  Interpretation and Visualization of Non-Linear Data Fusion in Kernel Space: Study on Metabolomic Characterization of Progression of Multiple Sclerosis , 2012, PloS one.

[95]  Dániel Szöllősi,et al.  Comparison of six multiclass classifiers by the use of different classification performance indicators , 2012 .

[96]  David I. Ellis,et al.  A comparative investigation of modern feature selection and classification approaches for the analysis of mass spectrometry data. , 2014, Analytica chimica acta.