Introduction to residuals and estimation by principal components analysis

• It is not always necessary to use all non-zero PCs in a model. • A partial model provides an approximation to the data. • We can write this as Ab X 1⁄4 ATAP or X 1⁄4 ATAPþAR where T and P are the scores and loadings using A PCs, Ab X is the approximation of X using this model and R is the residual matrix. Elements of R are the residuals for each measurement after a model based on A principal components. • Often R is alternatively often called the error matrix and sometimes denoted as E. However, strictly speaking, these are different, as errors are the consequences of experimentation and cannot normally be measured, whereas residuals are a consequence of the modelling. Ideally, if the model is a good one, they should be very similar and in many articles in the chemometrics literature are not distinguished. • In the case of the data of Table 1 of a previous article, if we set A = 1, then X 1⁄4 1T1Pþ 1R is the model using just one PC. • As the scores and loadings of one PC are vectors, we could rewrite this X 1⁄4 t1p1þ 1R, where “1” refers to a model with one component. and the results are numerically presented in Table 1. Note that as there are only two components in the data, so in this particular case, also 1R1⁄4 t2p2; that is, if we feel that one component is adequate to approximate the data, the second component represents what is sometimes called error, but is alternatively more properly defined as residuals as discussed above. DOI: 10.1002/cem.3407