Automated Neuron Detection in High-Content Fluorescence Microscopy Images Using Machine Learning

The study of neuronal morphology in relation to function, and the development of effective medicines to positively impact this relationship in patients suffering from neurodegenerative diseases, increasingly involves image-based high-content screening and analysis. The first critical step toward fully automated high-content image analyses in such studies is to detect all neuronal cells and distinguish them from possible non-neuronal cells or artifacts in the images. Here we investigate the performance of well-established machine learning techniques for this purpose. These include support vector machines, random forests, k-nearest neighbors, and generalized linear model classifiers, operating on an extensive set of image features extracted using the compound hierarchy of algorithms representing morphology, and the scale-invariant feature transform. We present experiments on a dataset of rat hippocampal neurons from our own studies to find the most suitable classifier(s) and subset(s) of features in the common practical setting where there is very limited annotated data for training. The results indicate that a random forests classifier using the right feature subset ranks best for the considered task, although its performance is not statistically significantly better than some support vector machine based classification models.

[1]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[2]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[3]  Franco Scarselli,et al.  On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Luís Torgo,et al.  A Survey of Predictive Modeling on Imbalanced Domains , 2016, ACM Comput. Surv..

[5]  Sheng Chen,et al.  Particle swarm optimisation assisted classification using elastic net prefiltering , 2013, Neurocomputing.

[6]  Leif Dehmelt,et al.  NeuriteQuant: An open source toolkit for high content screens of neuronal Morphogenesis , 2011, BMC Neuroscience.

[7]  Carlos Fernandez-Lozano,et al.  A methodology for the design of experiments in computational intelligence with multiple regression models , 2016, PeerJ.

[8]  Christoph Sommer,et al.  Machine learning in cell biology – teaching computers to recognize phenotypes , 2013, Journal of Cell Science.

[9]  Nima Tajbakhsh,et al.  Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? , 2016, IEEE Transactions on Medical Imaging.

[10]  Achim Zeileis,et al.  A New, Conditional Variable-Importance Measure for Random Forests Available in the party Package , 2009 .

[11]  H. Sebastian Seung,et al.  Trainable Weka Segmentation: a machine learning tool for microscopy pixel classification , 2017, Bioinform..

[12]  Oscar Herreras,et al.  Learning improvement after PI3K activation correlates with de novo formation of functional small spines , 2014, Front. Mol. Neurosci..

[13]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[14]  D. Gabor,et al.  Theory of communication. Part 1: The analysis of information , 1946 .

[15]  M. Dragunow High-content analysis in neuroscience , 2008, Nature Reviews Neuroscience.

[16]  Andreas K. Maier,et al.  Automatic Cell Detection in Bright-Field Microscope Images Using SIFT, Random Forests, and Hierarchical Clustering , 2013, IEEE Transactions on Medical Imaging.

[17]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[18]  Erik Meijering,et al.  Imagining the future of bioimage analysis , 2016, Nature Biotechnology.

[19]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[20]  Anne E Carpenter,et al.  Increasing the Content of High-Content Screening , 2014, Journal of biomolecular screening.

[21]  Mohammad Kazem Ebrahimpour,et al.  Occam's razor in dimension reduction: Using reduced row Echelon form for finding linear independent features in high dimensional microarray datasets , 2017, Eng. Appl. Artif. Intell..

[22]  Stella Redpath,et al.  A neuronal and astrocyte co-culture assay for high content analysis of neurotoxicity. , 2009, Journal of visualized experiments : JoVE.

[23]  Tien-Tsin Wong,et al.  Reconstruction of volumetric ultrasound panorama based on improved 3D SIFT , 2009, Comput. Medical Imaging Graph..

[24]  Erik H. W. Meijering,et al.  Automatic detection of neurons in high-content microscope images using machine learning approaches , 2016, 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI).

[25]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[26]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[27]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Oren Z. Kraus,et al.  Computer vision for high content screening , 2016, Critical reviews in biochemistry and molecular biology.

[29]  C. Rice,et al.  Sindbis virus expression vectors: packaging of RNA replicons by using defective helper RNAs , 1993, Journal of virology.

[30]  Changming Sun,et al.  Automated analysis of neurite branching in cultured cortical neurons using HCA‐Vision , 2007, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[31]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[32]  Richard Simon,et al.  Resampling Strategies for Model Assessment and Selection , 2007 .

[33]  Nicholas M. Radio,et al.  Neurite outgrowth assessment using high content analysis methodology. , 2012, Methods in molecular biology.

[34]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[35]  Dennis Gabor,et al.  Theory of communication , 1946 .

[36]  Arjen van Ooyen,et al.  The need for integrating neuronal morphology databases and computational environments in exploring neuronal structure and function , 2001, Anatomy and Embryology.

[37]  Christophe Trefois,et al.  Light microscopy applications in systems biology: opportunities and challenges , 2013, Cell Communication and Signaling.

[38]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Lior Shamir,et al.  Automatic detection of peculiar galaxies in large datasets of galaxy images , 2012, J. Comput. Sci..

[40]  Lilian Enriquez-Barreto,et al.  The PI3K signaling pathway as a pharmacological target in Autism related disorders and Schizophrenia , 2016, Molecular and Cellular Therapies.

[41]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[42]  Lior Shamir,et al.  Computer analysis of art , 2012, JOCCH.

[43]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .

[44]  H. Finner On a Monotonicity Problem in Step-Down Multiple Test Procedures , 1993 .

[45]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[46]  Nikolaos M. Avouris,et al.  EVALUATION OF CLASSIFIERS FOR AN UNEVEN CLASS DISTRIBUTION PROBLEM , 2006, Appl. Artif. Intell..

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[49]  Bong-Soo Han,et al.  Possibility Study of Scale Invariant Feature Transform (SIFT) Algorithm Application to Spine Magnetic Resonance Imaging , 2016, PloS one.

[50]  Jayadeva,et al.  Sparse short-term time series forecasting models via minimum model complexity , 2017, Neurocomputing.

[51]  D. Mannino,et al.  Continuing to Confront COPD International Patient Survey: Economic Impact of COPD in 12 Countries , 2016, PloS one.

[52]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[53]  Sheng Yang Michael Loh,et al.  Large‐scale image‐based screening and profiling of cellular phenotypes , 2017, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[54]  R. Samworth Optimal weighted nearest neighbour classifiers , 2011, 1101.5783.

[55]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[56]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[57]  S. R. Cajal Comprar Histología Del Sistema Nervioso Del Hombre Y De Los Vertebrados, Obra Completa 3 Vols. | S. Ramón y Cajal | 9788434017221 | Ministerio de Sanidad y Consumo , 2012 .

[58]  Jean-Marie Aerts,et al.  Reverse engineering of metabotropic glutamate receptor-dependent long-term depression in the hippocampus , 2011, BMC Neuroscience.

[59]  Jie Zhou,et al.  Automatic Dendritic Length Quantification for High Throughput Screening of Mature Neurons , 2015, Neuroinformatics.

[60]  L. Squire Memory and the hippocampus: a synthesis from findings with rats, monkeys, and humans. , 1992, Psychological review.

[61]  Ronald M. Summers,et al.  Deep Learning in Medical Imaging: Overview and Future Promise of an Exciting New Technique , 2016 .

[62]  Bernd Bischl,et al.  Resampling Methods for Meta-Model Validation with Recommendations for Evolutionary Computation , 2012, Evolutionary Computation.

[63]  Lior Shamir,et al.  Pattern Recognition Software and Techniques for Biological Image Analysis , 2010, PLoS Comput. Biol..

[64]  Danny Crookes,et al.  Live-Cell Tracking Using SIFT Features in DIC Microscopic Videos , 2010, IEEE Transactions on Biomedical Engineering.

[65]  Li-Wei Ko,et al.  HCS-Neurons: identifying phenotypic changes in multi-neuron images upon drug treatments of high-content screening , 2013, BMC Bioinformatics.

[66]  José Salvador Sánchez,et al.  A bias correction function for classification performance assessment in two-class imbalanced problems , 2014, Knowl. Based Syst..

[67]  Kathryn S Lilley,et al.  Structural and functional characteristics of cGMP-dependent methionine oxidation in Arabidopsis thaliana proteins , 2013, Cell Communication and Signaling.

[68]  Anne E Carpenter,et al.  CP-CHARM: segmentation-free image classification made accessible , 2016, BMC Bioinformatics.

[69]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[70]  Lior Shamir,et al.  WND-CHARM: Multi-purpose image classification using compound image transforms , 2008, Pattern Recognit. Lett..

[71]  Francisco Herrera,et al.  SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering , 2015, Inf. Sci..

[72]  Yaoqin Xie,et al.  Nonrigid Registration of Lung CT Images Based on Tissue Features , 2013, Comput. Math. Methods Medicine.

[73]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[74]  Simon Fong,et al.  Adaptive multi-objective swarm fusion for imbalanced data classification , 2018, Inf. Fusion.

[75]  Germán Cuesto,et al.  Phosphoinositide-3-Kinase Activation Controls Synaptogenesis and Spinogenesis in Hippocampal Neurons , 2011, The Journal of Neuroscience.

[76]  Thomas Wild,et al.  Machine Learning Improves the Precision and Robustness of High-Content Screens , 2011, Journal of biomolecular screening.

[77]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[78]  Natasha A. Khovanova,et al.  Handling limited datasets with neural networks in medical applications: A small-data approach , 2017, Artif. Intell. Medicine.

[79]  Giovanni Iacca,et al.  Ockham's Razor in memetic computing: Three stage optimal memetic exploration , 2012, Inf. Sci..

[80]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[81]  Aritra Ghosh,et al.  Robust Loss Functions under Label Noise for Deep Neural Networks , 2017, AAAI.

[82]  Yong Zhang,et al.  A novel tracing algorithm for high throughput imaging Screening of neuron-based assays , 2007, Journal of Neuroscience Methods.

[83]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[84]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[85]  Syed Muhammad Anwar,et al.  Deep Learning in Medical Image Analysis , 2017 .

[86]  Björn Persson,et al.  Faunus: An object oriented framework for molecular simulation , 2008, Source Code for Biology and Medicine.

[87]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[88]  Bernhard Schölkopf,et al.  A Primer on Kernel Methods , 2004 .

[89]  Stephen T Wong,et al.  Concise Review: A High‐Content Screening Approach to Stem Cell Research and Drug Discovery , 2012, Stem cells.

[90]  P. Heutink,et al.  High Content Screening in Neurodegenerative Diseases , 2012, Journal of visualized experiments : JoVE.

[91]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[92]  Lior Shamir,et al.  Source Code for Biology and Medicine Open Access Wndchrm – an Open Source Utility for Biological Image Analysis , 2022 .

[93]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[94]  Shree K. Nayar,et al.  Spatial information in multiresolution histograms , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[95]  Pengyu Hong,et al.  Automatic Robust Neurite Detection and Morphological Analysis of Neuronal Cell Cultures in High-content Screening , 2010, Neuroinformatics.

[96]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[97]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[98]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[99]  Sohail Asghar,et al.  A REVIEW OF FEATURE SELECTION TECHNIQUES IN STRUCTURE LEARNING , 2013 .

[100]  Lior Shamir,et al.  Combining Human and Machine Learning for Morphological Analysis of Galaxy Images , 2014, ArXiv.

[101]  Klaus Hechenbichler,et al.  Weighted k-Nearest-Neighbor Techniques and Ordinal Classification , 2004 .

[102]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[103]  George Forman,et al.  Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement , 2010, SKDD.

[104]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[105]  Anjana Gosain,et al.  Handling class imbalance problem using oversampling techniques: A review , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[106]  Dongdong Yu,et al.  Fast Rotation-Free Feature-Based Image Registration Using Improved N-SIFT and GMM-Based Parallel Optimization , 2016, IEEE Transactions on Biomedical Engineering.

[107]  Giorgio A. Ascoli,et al.  Trees of the Brain, Roots of the Mind , 2015 .

[108]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[109]  Erik Meijering,et al.  Neuron tracing in perspective , 2010, Cytometry. Part A : the journal of the International Society for Analytical Cytology.

[110]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[111]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[112]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[113]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[114]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[115]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).