Boosting accuracy of automated classification of fluorescence microscope images for location proteomics

BackgroundDetailed knowledge of the subcellular location of each expressed protein is critical to a full understanding of its function. Fluorescence microscopy, in combination with methods for fluorescent tagging, is the most suitable current method for proteome-wide determination of subcellular location. Previous work has shown that neural network classifiers can distinguish all major protein subcellular location patterns in both 2D and 3D fluorescence microscope images. Building on these results, we evaluate here new classifiers and features to improve the recognition of protein subcellular location patterns in both 2D and 3D fluorescence microscope images.ResultsWe report here a thorough comparison of the performance on this problem of eight different state-of-the-art classification methods, including neural networks, support vector machines with linear, polynomial, radial basis, and exponential radial basis kernel functions, and ensemble methods such as AdaBoost, Bagging, and Mixtures-of-Experts. Ten-fold cross validation was used to evaluate each classifier with various parameters on different Subcellular Location Feature sets representing both 2D and 3D fluorescence microscope images, including new feature sets incorporating features derived from Gabor and Daubechies wavelet transforms. After optimal parameters were chosen for each of the eight classifiers, optimal majority-voting ensemble classifiers were formed for each feature set. Comparison of results for each image for all eight classifiers permits estimation of the lower bound classification error rate for each subcellular pattern, which we interpret to reflect the fraction of cells whose patterns are distorted by mitosis, cell death or acquisition errors. Overall, we obtained statistically significant improvements in classification accuracy over the best previously published results, with the overall error rate being reduced by one-third to one-half and with the average accuracy for single 2D images being higher than 90% for the first time. In particular, the classification accuracy for the easily confused endomembrane compartments (endoplasmic reticulum, Golgi, endosomes, lysosomes) was improved by 5–15%. We achieved further improvements when classification was conducted on image sets rather than on individual cell images.ConclusionsThe availability of accurate, fast, automated classification systems for protein location patterns in conjunction with high throughput fluorescence microscope imaging techniques enables a new subfield of proteomics, location proteomics. The accuracy and sensitivity of this approach represents an important alternative to low-resolution assignments by curation or sequence-based prediction.

[1]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  M. Gerstein,et al.  A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome. , 2000, Journal of molecular biology.

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[6]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[7]  Gavin MacBeath,et al.  Protein microarrays and proteomics , 2002, Nature Genetics.

[8]  Peer Bork,et al.  Predicting protein cellular localization using a domain projection method. , 2002, Genome research.

[9]  Robert F. Murphy,et al.  A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells , 2001, Bioinform..

[10]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[11]  Giorgio Valentini,et al.  Ensembles of Learning Machines , 2002, WIRN.

[12]  Robert P. W. Duin,et al.  Is independence good for combining classifiers? , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[13]  B. S. Manjunath,et al.  A comparison of wavelet transform features for texture image annotation , 1995, Proceedings., International Conference on Image Processing.

[14]  K. Nakai Protein sorting signals and prediction of subcellular localization. , 2000, Advances in protein chemistry.

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[17]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[18]  Kai Huang,et al.  Feature reduction for improved recognition of subcellular location patterns in fluorescence microscope images , 2003, SPIE BiOS.

[19]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[20]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[21]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[23]  Robert F. Murphy,et al.  Location proteomics: building subcellular location trees from high-resolution 3D fluorescence microscope images of randomly tagged proteins , 2003, SPIE BiOS.

[24]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[25]  M V Boland,et al.  Toward objective selection of representative microscope images. , 1999, Biophysical journal.

[26]  A. Danckaert,et al.  Automated Recognition of Intracellular Organelles in Confocal Microscope Images , 2002, Traffic.

[27]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[28]  David D. Denison,et al.  Nonlinear estimation and classification , 2003 .

[29]  Anthony Ralston,et al.  Statistical Methods for Digital Computers. , 1980 .

[30]  John G. Daugman,et al.  Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression , 1988, IEEE Trans. Acoust. Speech Signal Process..

[31]  Josef Kittler,et al.  Fusion of multiple experts in multimodal biometric personal identity verification systems , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[32]  Martin Norin,et al.  Structural proteomics: developments in structure-to-function predictions. , 2002, Trends in biotechnology.

[33]  Steven R. Waterhouse,et al.  Classification and Regression using Mixtures of Experts , 1997 .

[34]  Robert F. Murphy,et al.  Automated determination of protein subcellular locations from 3D fluorescence microscope images , 2002, Proceedings IEEE International Symposium on Biomedical Imaging.

[35]  Robert F. Murphy,et al.  Towards a Systematics for Protein Subcellular Location: Quantitative Description of Protein Localization Patterns and Automated Analysis of Fluorescence Microscope Images , 2000, ISMB.

[36]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[37]  S. Brunak,et al.  SHORT COMMUNICATION Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites , 1997 .

[38]  Robert F. Murphy,et al.  Robust Numerical Features for Description and Classification of Subcellular Location Patterns in Fluorescence Microscope Images , 2003, J. VLSI Signal Process..

[39]  John C Reed,et al.  Advances in molecular labeling, high throughput imaging and machine intelligence portend powerful functional cellular biochemistry tools , 2002, Journal of cellular biochemistry. Supplement.

[40]  Maria Marinaro,et al.  Neural Nets WIRN Vietri-01 , 2002, Perspectives in Neural Computing.