Determining the subcellular location of new proteins from microscope images using local features

MOTIVATION Evaluation of previous systems for automated determination of subcellular location from microscope images has been done using datasets in which each location class consisted of multiple images of the same representative protein. Here, we frame a more challenging and useful problem where previously unseen proteins are to be classified. RESULTS Using CD-tagging, we generated two new image datasets for evaluation of this problem, which contain several different proteins for each location class. Evaluation of previous methods on these new datasets showed that it is much harder to train a classifier that generalizes across different proteins than one that simply recognizes a protein it was trained on. We therefore developed and evaluated additional approaches, incorporating novel modifications of local features techniques. These extended the notion of local features to exploit both the protein image and any reference markers that were imaged in parallel. With these, we obtained a large accuracy improvement in our new datasets over existing methods. Additionally, these features help achieve classification improvements for other previously studied datasets. AVAILABILITY The datasets are available for download at http://murphylab.web.cmu.edu/data/. The software was written in Python and C++ and is available under an open-source license at http://murphylab.web.cmu.edu/software/. The code is split into a library, which can be easily reused for other data and a small driver script for reproducing all results presented here. A step-by-step tutorial on applying the methods to new datasets is also available at that address. CONTACT murphy@cmu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Kai Huang,et al.  Boosting accuracy of automated classification of fluorescence microscope images for location proteomics , 2004, BMC Bioinformatics.

[2]  Song Liu,et al.  Protein localization on cellular images with Markov random fields , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[3]  Robert F. Murphy,et al.  Object type recognition for automated analysis of protein subcellular location , 2005, IEEE Transactions on Image Processing.

[4]  Helge J. Ritter,et al.  Human vs. machine: evaluation of fluorescence micrographs , 2003, Comput. Biol. Medicine.

[5]  E. Lundberg,et al.  Toward a Confocal Subcellular Atlas of the Human Proteome*S , 2008, Molecular & Cellular Proteomics.

[6]  Robert F. Murphy,et al.  A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells , 2001, Bioinform..

[7]  Seungil Huh,et al.  Efficient framework for automated classification of subcellular patterns in budding yeast , 2009, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[8]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[9]  Jieyue Li,et al.  Automated Analysis and Reannotation of Subcellular Locations in Confocal Images from the Human Protein Atlas , 2012, PloS one.

[10]  Lixin Fan,et al.  Categorizing Nine Visual Classes using Local Appearance Descriptors , 2004 .

[11]  Eric P. Xing,et al.  Structured Literature Image Finder: Extracting Information from Text and Images in Biomedical Literature , 2009, BioLINK@ISMB/ECCB.

[12]  Anthony Ralston,et al.  Statistical Methods for Digital Computers. , 1980 .

[13]  Estelle Glory-Afshar,et al.  Determining the distribution of probes between different subcellular locations through automated unmixing of subcellular patterns , 2010, Proceedings of the National Academy of Sciences.

[14]  Elvira García Osuna,et al.  Large-Scale Automated Analysis of Location Patterns in Randomly Tagged 3T3 Cells , 2007, Annals of Biomedical Engineering.

[15]  M V Boland,et al.  Automated recognition of patterns characteristic of subcellular structures in fluorescence microscopy images. , 1998, Cytometry.

[16]  Luis Pedro Coelho,et al.  Mahotas: Open source software for scriptable computer vision , 2012, ArXiv.

[17]  Jeff G. Schneider,et al.  Protein subcellular location pattern classification in cellular images using latent discriminative models , 2012, Bioinform..

[18]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[20]  Robert F. Murphy,et al.  Robust Numerical Features for Description and Classification of Subcellular Location Patterns in Fluorescence Microscope Images , 2003, J. VLSI Signal Process..

[21]  Robert F. Murphy,et al.  Quantifying the distribution of probes between subcellular locations using unsupervised pattern unmixing , 2010, Bioinform..

[22]  Loris Nanni,et al.  A reliable method for cell phenotype image classification , 2008, Artif. Intell. Medicine.

[23]  Kai Huang,et al.  Feature reduction for improved recognition of subcellular location patterns in fluorescence microscope images , 2003, SPIE BiOS.

[24]  Jun Kawai,et al.  Subcellular Localization of Mammalian Type II Membrane Proteins , 2006, Traffic.

[25]  Loris Nanni,et al.  Novel features for automated cell phenotype image classification. , 2010, Advances in experimental medicine and biology.

[26]  P. Geurts,et al.  Random subwindows and extremely randomized trees for image classification in cell biology , 2007, BMC Cell Biology.

[27]  Chun-Nan Hsu,et al.  Boosting multiclass learning with repeating codes and weak detectors for protein subcellular localization , 2007, Bioinform..

[28]  Jelena Kovacevic,et al.  A multiresolution approach to automated classification of protein subcellular location images , 2007, BMC Bioinformatics.

[29]  L Hennen,et al.  In vivo functional proteomics: mammalian genome annotation using CD-tagging. , 2002, BioTechniques.

[30]  Nicholas A. Hamilton,et al.  Fast automated cell phenotype image classification , 2007, BMC Bioinformatics.

[31]  Mike Baxter,et al.  Stepwise Discriminant Analysis in Archaeometry: a Critique , 1994 .

[32]  Lior Shamir,et al.  Source Code for Biology and Medicine Open Access Wndchrm – an Open Source Utility for Biological Image Analysis , 2022 .

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  Lior Shamir,et al.  IICBU 2008: a proposed benchmark suite for biological image analysis , 2008, Medical & Biological Engineering & Computing.

[35]  H. Akaike A new look at the statistical model identification , 1974 .

[36]  R. Murphy,et al.  A framework for the automated analysis of subcellular patterns in human protein atlas images. , 2008, Journal of proteome research.

[37]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..