Using machine learning to speed up manual image annotation: application to a 3D imaging protocol for measuring single cell gene expression in the developing C. elegans embryo

BackgroundImage analysis is an essential component in many biological experiments that study gene expression, cell cycle progression, and protein localization. A protocol for tracking the expression of individual C. elegans genes was developed that collects image samples of a developing embryo by 3-D time lapse microscopy. In this protocol, a program called StarryNite performs the automatic recognition of fluorescently labeled cells and traces their lineage. However, due to the amount of noise present in the data and due to the challenges introduced by increasing number of cells in later stages of development, this program is not error free. In the current version, the error correction (i.e., editing) is performed manually using a graphical interface tool named AceTree, which is specifically developed for this task. For a single experiment, this manual annotation task takes several hours.ResultsIn this paper, we reduce the time required to correct errors made by StarryNite. We target one of the most frequent error types (movements annotated as divisions) and train a support vector machine (SVM) classifier to decide whether a division call made by StarryNite is correct or not. We show, via cross-validation experiments on several benchmark data sets, that the SVM successfully identifies this type of error significantly. A new version of StarryNite that includes the trained SVM classifier is available at http://starrynite.sourceforge.net.ConclusionsWe demonstrate the utility of a machine learning approach to error annotation for StarryNite. In the process, we also provide some general methodologies for developing and validating a classifier with respect to a given pattern recognition task.

[1]  Meng Wang,et al.  Context based mixture model for cell phase identification in automated fluorescence microscopy , 2007, BMC Bioinformatics.

[2]  William Stafford Noble,et al.  Support vector machine , 2013 .

[3]  Badrinath Roysam,et al.  A hybrid 3D watershed algorithm incorporating gradient cues and object models for automatic segmentation of nuclei in confocal image stacks , 2003, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[4]  Anne E Carpenter,et al.  CellProfiler: image analysis software for identifying and quantifying cell phenotypes , 2006, Genome Biology.

[5]  Anne E Carpenter,et al.  CellProfiler: free, versatile software for automated biological image analysis. , 2007, BioTechniques.

[6]  R. Waterston,et al.  Automated cell lineage tracing in Caenorhabditis elegans. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Badrinath Roysam,et al.  A multi‐model approach to simultaneous segmentation and classification of heterogeneous populations of cell nuclei in 3D confocal microscope images , 2007, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[8]  Saeid Sanei,et al.  CELL RECOGNITION BASED ON PCA AND BAYESIAN CLASSIFICATION , 2003 .

[9]  John Isaac Murray,et al.  The lineaging of fluorescently-labeled Caenorhabditis elegans embryos with StarryNite and AceTree , 2006, Nature Protocols.

[10]  R. Wollman,et al.  High throughput microscopy: from raw images to discoveries , 2007, Journal of Cell Science.

[11]  Meng Wang,et al.  Novel cell segmentation and online SVM for cell cycle phase identification in automated microscopy , 2008, Bioinform..

[12]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[13]  Xiaobo Zhou,et al.  Automated segmentation, classification, and tracking of cancer cell nuclei in time-lapse microscopy , 2006, IEEE Transactions on Biomedical Engineering.

[14]  Richard A. Russell,et al.  Segmentation of fluorescence microscopy images for quantitative analysis of cell nuclear architecture. , 2009, Biophysical journal.

[15]  Hiroaki Kitano,et al.  Detection of nuclei in 4D Nomarski DIC microscope images of early Caenorhabditis elegans embryos using local image entropy and object tracking , 2005, BMC Bioinformatics.

[16]  Zhirong Bao,et al.  AceTree: a tool for visual analysis of Caenorhabditis elegans embryogenesis , 2006, BMC Bioinformatics.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Polina Golland,et al.  CellProfiler Analyst: data exploration and analysis software for complex image-based screens , 2008, BMC Bioinformatics.

[19]  Thomas J. Nicholas,et al.  Automated analysis of embryonic gene expression with cellular resolution in C. elegans , 2008, Nature Methods.

[20]  BMC Bioinformatics , 2005 .