Utilities for quantifying separation in PCA/PLS-DA scores plots.

Metabolic fingerprinting studies rely on interpretations drawn from low-dimensional representations of spectral data generated by methods of multivariate analysis such as principal components analysis and projection to latent structures discriminant analysis. The growth of metabolic fingerprinting and chemometric analyses involving these low-dimensional scores plots necessitates the use of quantitative statistical measures to describe significant differences between experimental groups. Our updated version of the PCAtoTree software provides methods to reliably visualize and quantify separations in scores plots through dendrograms employing both nonparametric and parametric hypothesis testing to assess node significance, as well as scores plots identifying 95% confidence ellipsoids for all experimental groups.

[1]  J. Retief,et al.  Phylogenetic analysis using PHYLIP. , 2000, Methods in molecular biology.

[2]  R. Powers,et al.  MUC1 mucin stabilizes and activates hypoxia-inducible factor 1 alpha to regulate metabolism in pancreatic cancer , 2012, Proceedings of the National Academy of Sciences.

[3]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[4]  Wojtek J. Krzanowski,et al.  Cross-Validation in Principal Component Analysis , 1987 .

[5]  H. Hotelling The Generalization of Student’s Ratio , 1931 .

[6]  Michael A Kennedy,et al.  Quantification and statistical significance analysis of group separation in NMR-based metabonomics studies. , 2011, Chemometrics and intelligent laboratory systems : an international journal sponsored by the Chemometrics Society.

[7]  R. Powers,et al.  Predicting the in vivo mechanism of action for drug leads using NMR metabolomics. , 2012, ACS chemical biology.

[8]  D. Massart,et al.  The Mahalanobis distance , 2000 .

[9]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[10]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Michael L. Raymer,et al.  Gaussian binning: a new kernel-based method for processing NMR spectroscopic data for metabolomics , 2008, Metabolomics.

[12]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[13]  Rasmus Bro,et al.  Some common misunderstandings in chemometrics , 2010 .

[14]  R. Powers,et al.  Analysis of bacterial biofilms using NMR-based metabolomics. , 2012, Future medicinal chemistry.

[15]  Robert Powers,et al.  Analysis of metabolomic PCA data using tree diagrams. , 2010, Analytical biochemistry.

[16]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[17]  W. Krzanowski,et al.  Cross-Validatory Choice of the Number of Components From a Principal Component Analysis , 1982 .

[18]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[19]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[20]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[21]  R. Powers,et al.  Application of NMR metabolomics to search for human disease biomarkers. , 2012, Combinatorial chemistry & high throughput screening.

[22]  Aaron M. Goodpaster,et al.  Statistical significance analysis of nuclear magnetic resonance-based metabonomics data. , 2010, Analytical biochemistry.

[23]  Maria E. Holmboe,et al.  Use of cluster separation indices and the influence of outliers: application of two new separation indices, the modified silhouette index and the overlap coefficient to simulated data and mouse urine metabolomic profiles , 2009 .