The application of principal component analysis to stylometry

In recent years principal component analysis has become popular for investigations in computational stylistics, particularly for studies of authorship. The mathematical nature of the theory that underpins the method makes it rather inaccessible to linguists and literary scholars. Consequently, confidence in its correct application is diminished. By first restricting the procedure to the use of two marker words, a pictorial description of its operation is derived. Some characteristics of the method are then examined. Finally, in the context of a Shakespearean example the technique is extended to p words, and suggestions are advanced to alleviate possible shortcomings