Applying Classification Separability Analysis to Microarray Data

We describe a novel approach to process genome-wide expression data from multiple arrays representing different classes of experiment conditions. We first derive a new unified maximum separability analysis (UMSA) procedure for constructing linear classifiers and demonstrate that the procedure unifies the classic linear discriminant analysis method and the optimal margin hyperplane method as used in support vector machines. We then present a stepwise backward algorithm using UMSA to compute significance scores for individual genes based on their collective contribution to the separation of different classes of arrays. Using the public data sets of the budding yeast saccharomyces cerevisiae, we demonstrate the effectiveness of the UMSA based algorithms in identifying genes with the most discriminatory power in separating arrays of cells under normal division cycles and those under heat shock.

[1]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[2]  J. Friedman Regularized Discriminant Analysis , 1989 .

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  E. Dodd,et al.  The Mathematical Theory of Probabilities. , 1923 .

[5]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.