Principal component analysis for designed experiments

BackgroundPrincipal component analysis is used to summarize matrix data, such as found in transcriptome, proteome or metabolome and medical examinations, into fewer dimensions by fitting the matrix to orthogonal axes. Although this methodology is frequently used in multivariate analyses, it has disadvantages when applied to experimental data. First, the identified principal components have poor generality; since the size and directions of the components are dependent on the particular data set, the components are valid only within the data set. Second, the method is sensitive to experimental noise and bias between sample groups. It cannot reflect the experimental design that is planned to manage the noise and bias; rather, it estimates the same weight and independence to all the samples in the matrix. Third, the resulting components are often difficult to interpret. To address these issues, several options were introduced to the methodology. First, the principal axes were identified using training data sets and shared across experiments. These training data reflect the design of experiments, and their preparation allows noise to be reduced and group bias to be removed. Second, the center of the rotation was determined in accordance with the experimental design. Third, the resulting components were scaled to unify their size unit.ResultsThe effects of these options were observed in microarray experiments, and showed an improvement in the separation of groups and robustness to noise. The range of scaled scores was unaffected by the number of items. Additionally, unknown samples were appropriately classified using pre-arranged axes. Furthermore, these axes well reflected the characteristics of groups in the experiments. As was observed, the scaling of the components and sharing of axes enabled comparisons of the components beyond experiments. The use of training data reduced the effects of noise and bias in the data, facilitating the physical interpretation of the principal axes.ConclusionsTogether, these introduced options result in improved generality and objectivity of the analytical results. The methodology has thus become more like a set of multiple regression analyses that find independent models that specify each of the axes.

[1]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[2]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[3]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[4]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[5]  Sadanori Konishi,et al.  Principal component analysis for multivariate familial data , 1992 .

[6]  D. Botstein,et al.  Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  J. Edward Jackson,et al.  A User's Guide to Principal Components: Jackson/User's Guide to Principal Components , 2004 .

[9]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[10]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[11]  Søren Brunak,et al.  Integration of clinical chemistry, expression, and metabolite data leads to better toxicological class separation. , 2008, Toxicological sciences : an official journal of the Society of Toxicology.

[12]  Sanne Engelen,et al.  A comparison of three procedures for robust PCA in high dimensions , 2016 .

[13]  Orly Alter,et al.  Genomic signal processing: from matrix algebra to genetic networks. , 2007, Methods in molecular biology.

[14]  I. Jolliffe Principal Component Analysis , 2002 .

[15]  Li Liu,et al.  Robust singular value decomposition analysis of microarray data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[17]  Lutgarde M. C. Buydens,et al.  Interpretation of ANOVA models for microarray data using PCA , 2007, Bioinform..

[18]  M. C. Rudolph,et al.  Key stages in mammary gland development. Secretory activation in the mammary gland: it's not just about milk protein synthesis! , 2007, Breast Cancer Research.

[19]  C. Skinner,et al.  The effect of sample design on principal component analysis , 1986 .

[20]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[21]  A. Konagaya,et al.  Coincidence between Transcriptome Analyses on Different Microarray Platforms Using a Parametric Framework , 2008, PloS one.