Machine learning in cell biology – teaching computers to recognize phenotypes

Summary Recent advances in microscope automation provide new opportunities for high-throughput cell biology, such as image-based screening. High-complex image analysis tasks often make the implementation of static and predefined processing rules a cumbersome effort. Machine-learning methods, instead, seek to use intrinsic data structure, as well as the expert annotations of biologists to infer models that can be used to solve versatile data analysis tasks. Here, we explain how machine-learning methods work and what needs to be considered for their successful application in cell biology. We outline how microscopy images can be converted into a data representation suitable for machine learning, and then introduce various state-of-the-art machine-learning algorithms, highlighting recent applications in image-based screening. Our Commentary aims to provide the biologist with a guide to the application of machine learning to microscopy assays and we therefore include extensive discussion on how to optimize experimental workflow as well as the data analysis pipeline.

[1]  Richard A. Olshen,et al.  CART: Classification and Regression Trees , 1984 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Mark S. Nixon,et al.  Statistical geometrical features for texture classification , 1995, Pattern Recognit..

[4]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  Michael Unser,et al.  A pyramid approach to subpixel registration based on intensity , 1998, IEEE Trans. Image Process..

[7]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[8]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9]  Robert F. Murphy,et al.  A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells , 2001, Bioinform..

[10]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[11]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[13]  Kyoung-jae Kim,et al.  Financial time series forecasting using support vector machines , 2003, Neurocomputing.

[14]  Kai Huang,et al.  Automated classification of subcellular patterns in multicell images without segmentation into single cells , 2004, 2004 2nd IEEE International Symposium on Biomedical Imaging: Nano to Macro (IEEE Cat No. 04EX821).

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Joakim Lindblad,et al.  Image analysis for automatic segmentation of cytoplasms and classification of Rac1 activation , 2004, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[17]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[18]  Kuo-Chen Chou,et al.  Bio-support vector machines for computational proteomics , 2004, Bioinform..

[19]  James Inglese,et al.  Assay Development Guidelines for Image-Based High Content Screening, High Content Analysis and High Content Imaging -- Assay Guidance Manual , 2014 .

[20]  Lani F. Wu,et al.  Multidimensional Drug Profiling By Automated Microscopy , 2004, Science.

[21]  H. Himmelbauer,et al.  An endoribonuclease-prepared siRNA screen in human cells identifies genes essential for cell division , 2004, Nature.

[22]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[23]  C. Conrad,et al.  Automatic identification of subcellular phenotypes on human cell arrays. , 2004, Genome research.

[24]  Robert F Murphy,et al.  Automated interpretation of subcellular patterns from immunofluorescence microscopy. , 2004, Journal of immunological methods.

[25]  Robert Castelo,et al.  Splice site identification by idlBNs , 2004, ISMB/ECCB.

[26]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[27]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[28]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[29]  Anne E Carpenter,et al.  CellProfiler: image analysis software for identifying and quantifying cell phenotypes , 2006, Genome Biology.

[30]  Olivier Gascuel,et al.  Identification of novel peptide hormones in the human proteome by hidden Markov model screening. , 2007, Genome research.

[31]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[32]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[33]  Jelena Kovacevic,et al.  A multiresolution approach to automated classification of protein subcellular location images , 2007, BMC Bioinformatics.

[34]  A. Hyman,et al.  Genome-scale RNAi profiling of cell division in human tissue culture cells , 2007, Nature Cell Biology.

[35]  Takafumi Kanamori,et al.  Robust Loss Functions for Boosting , 2007, Neural Computation.

[36]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[37]  Lani F. Wu,et al.  Image-based multivariate profiling of drug responses from single cells , 2007, Nature Methods.

[38]  L. Lederman High-content screening. , 2007, BioTechniques.

[39]  Stephen T. C. Wong,et al.  Cellular Phenotype Recognition for High-Content RNA Interference Genome-Wide Screening , 2008, Journal of biomolecular screening.

[40]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[41]  Lior Shamir,et al.  Source Code for Biology and Medicine Open Access Wndchrm – an Open Source Utility for Biological Image Analysis , 2022 .

[42]  Polina Golland,et al.  CellProfiler Analyst: data exploration and analysis software for complex image-based screens , 2008, BMC Bioinformatics.

[43]  Karthik Devarajan,et al.  Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology , 2008, PLoS Comput. Biol..

[44]  Lani F. Wu,et al.  Characterizing heterogeneous cellular responses to perturbations , 2008, Proceedings of the National Academy of Sciences.

[45]  Nicholas A. Hamilton,et al.  BMC Bioinformatics BioMed Central Methodology article Statistical and visual differentiation of subcellular imaging , 2008 .

[46]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[47]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[48]  Fredrik Olsson,et al.  A Web Survey on the Use of Active Learning to Support Annotation of Text Data , 2009, HLT-NAACL 2009.

[49]  J. Ellenberg,et al.  RNF168 Binds and Amplifies Ubiquitin Conjugates on Damaged Chromosomes to Allow Accumulation of Repair Proteins , 2009, Cell.

[50]  E. Myers,et al.  A 3D Digital Atlas of C. elegans and Its Application To Single-Cell Analyses , 2009, Nature Methods.

[51]  D. Steinberg CART: Classification and Regression Trees , 2009 .

[52]  Polina Golland,et al.  Scoring diverse cellular morphologies in image-based screens with iterative feedback and machine learning , 2009, Proceedings of the National Academy of Sciences.

[53]  Seungil Huh,et al.  Efficient framework for automated classification of subcellular patterns in budding yeast , 2009, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[54]  Kevin W. Eliceiri,et al.  Open source bioimage informatics for cell biology , 2009, Trends in cell biology.

[55]  Lit-Hsin Loo,et al.  Heterogeneity in the physiological states and pharmacological responses of differentiating 3T3-L1 preadipocytes , 2009, The Journal of cell biology.

[56]  R. Durbin,et al.  Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes , 2010, Nature.

[57]  Zhuowen Tu,et al.  Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Joachim M. Buhmann,et al.  Neuron geometry extraction by perceptual grouping in ssTEM images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[59]  Staffan Strömblad,et al.  Systems microscopy: an emerging strategy for the life sciences. , 2010, Experimental cell research.

[60]  Bernd Fischer,et al.  CellCognition: time-resolved phenotype annotation in high-throughput live cell imaging , 2010, Nature Methods.

[61]  Susmita Datta,et al.  Feature selection and machine learning with mass spectrometry data. , 2010, Methods in molecular biology.

[62]  Chen Lin,et al.  Novel Morphological Phenotypes Discovery in High-Content Screens Using Underused Features , 2010, BICoB.

[63]  Aabid Shariff,et al.  Automated Image Analysis for High-Content Screening and Analysis , 2010, Journal of biomolecular screening.

[64]  Y. Kalaidzidis,et al.  Systems survey of endocytosis by multiparametric image analysis , 2010, Nature.

[65]  M Zeder,et al.  Automated quality assessment of autonomously acquired microscopic images of fluorescently stained bacteria , 2009, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[66]  Peter Horvath,et al.  A Protein Inventory of Human Ribosome Biogenesis Reveals an Essential Function of Exportin 5 in 60S Subunit Export , 2010, PLoS biology.

[67]  Lior Shamir,et al.  Pattern Recognition Software and Techniques for Biological Image Analysis , 2010, PLoS Comput. Biol..

[68]  Otto Hudecz,et al.  Live-cell imaging RNAi screen identifies PP2A–B55α and importin-β1 as key mitotic exit regulators in human cells , 2010, Nature Cell Biology.

[69]  Wolfgang Huber,et al.  EBImage—an R package for image processing with applications to cellular phenotypes , 2010, Bioinform..

[70]  M. Boutros,et al.  Clustering phenotype populations by genome-wide RNAi and multiparametric imaging , 2010, Molecular systems biology.

[71]  Anne E Carpenter,et al.  Small molecules discovered in a pathway screen target the Rho pathway in cytokinesis , 2010, Nature chemical biology.

[72]  Lani F. Wu,et al.  Patterns of basal signaling heterogeneity can distinguish cellular populations with different drug sensitivities , 2010, Molecular systems biology.

[73]  C. Conrad,et al.  Automated microscopy for high-content RNAi screening , 2010, The Journal of cell biology.

[74]  Wolfgang Huber,et al.  Mapping of signaling networks through synthetic genetic interaction analysis by RNAi , 2011, Nature Methods.

[75]  Song Liu,et al.  Features for cells and nuclei classification , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[76]  Mario Niepel,et al.  Adaptive informatics for multi-factorial and high content biological data , 2011, Nature Methods.

[77]  Robert F Murphy,et al.  An active role for machine learning in drug development. , 2011, Nature chemical biology.

[78]  Ullrich Köthe,et al.  Ilastik: Interactive learning and segmentation toolkit , 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[79]  Gaudenz Danuser,et al.  Computer Vision in Cell Biology , 2011, Cell.

[80]  Thomas Wild,et al.  Machine Learning Improves the Precision and Robustness of High-Content Screens , 2011, Journal of biomolecular screening.

[81]  R. Aebersold,et al.  mProphet: automated data processing and statistical validation for large-scale SRM experiments , 2011, Nature Methods.

[82]  L Shamir,et al.  Assessing the efficacy of low‐level image content descriptors for computer‐based fluorescence microscopy image analysis , 2011, Journal of microscopy.

[83]  Jan Ellenberg,et al.  Micropilot: automation of fluorescence microscopy–based imaging for systems biology , 2011, Nature Methods.

[84]  Anne E Carpenter,et al.  Improved structure, function and compatibility for CellProfiler: modular high-throughput image analysis software , 2011, Bioinform..

[85]  Lucas Pelkmans,et al.  RNAi screening reveals proteasome- and Cullin3-dependent stages in vaccinia virus infection. , 2012, Cell reports.

[86]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[87]  Jieyue Li,et al.  Toward the virtual cell: Automated approaches to building models of subcellular organization “learned” from microscopy images , 2012, BioEssays : news and reviews in molecular, cellular and developmental biology.

[88]  Joachim M Buhmann,et al.  Unsupervised modeling of cell morphology dynamics for time-lapse microscopy , 2012, Nature Methods.

[89]  Satwik Rajaram,et al.  SimuCell: a flexible framework for creating synthetic microscopy images , 2012, Nature Methods.

[90]  B. S. Manjunath,et al.  Biological imaging software tools , 2012, Nature Methods.

[91]  Satwik Rajaram,et al.  PhenoRipper: software for rapidly profiling microscopy images , 2012, Nature Methods.

[92]  Johannes E. Schindelin,et al.  Fiji: an open-source platform for biological-image analysis , 2012, Nature Methods.

[93]  Gene Myers,et al.  Why bioimage informatics matters , 2012, Nature Methods.

[94]  A. Hyman,et al.  Sds22 and Repo-Man stabilize chromosome segregation by counteracting Aurora B on anaphase kinetochores , 2012, The Journal of cell biology.

[95]  Jin-Kao Hao,et al.  Pattern Recognition in Bioinformatics , 2013, Lecture Notes in Computer Science.

[96]  Anne E Carpenter,et al.  Annotated high-throughput microscopy image sets for validation , 2012, Nature Methods.

[97]  Melanie Boerries,et al.  Label-Free Detection of Neuronal Differentiation in Cell Populations Using High-Throughput Live-Cell Imaging of PC12 Cells , 2013, PloS one.

[98]  Bernd Fischer,et al.  CellH5: a format for data exchange in high-content screening , 2013, Bioinform..

[99]  Ruedi Aebersold,et al.  Dual Specificity Kinase DYRK3 Couples Stress Granule Condensation/Dissolution to mTORC1 Signaling , 2013, Cell.

[100]  João Manuel R S Tavares,et al.  Medical image registration: a review , 2014, Computer methods in biomechanics and biomedical engineering.

[101]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .