Developmental Stage Classification of Embryos Using Two-Stream Neural Network with Linear-Chain Conditional Random Field

The developmental process of embryos follows a monotonic order. An embryo can progressively cleave from one cell to multiple cells and finally transform to morula and blastocyst. For time-lapse videos of embryos, most existing developmental stage classification methods conduct per-frame predictions using an image frame at each time step. However, classification using only images suffers from overlapping between cells and imbalance between stages. Temporal information can be valuable in addressing this problem by capturing movements between neighboring frames. In this work, we propose a two-stream model for developmental stage classification. Unlike previous methods, our two-stream model accepts both temporal and image information. We develop a linear-chain conditional random field (CRF) on top of neural network features extracted from the temporal and image streams to make use of both modalities. The linear-chain CRF formulation enables tractable training of global sequential models over multiple frames while also making it possible to inject monotonic development order constraints into the learning process explicitly. We demonstrate our algorithm on two time-lapse embryo video datasets: i) mouse and ii) human embryo datasets. Our method achieves 98.1% and 80.6% for mouse and human embryo stage classification, respectively. Our approach will enable more pro-found clinical and biological studies and suggests a new direction for developmental stage classification by utilizing temporal information.

[1]  CicconetMarcelo,et al.  Label free cell-tracking and division detection based on 2D time-lapse images for lineage analysis of early embryo development , 2014 .

[2]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[3]  Julian J. McAuley,et al.  Embryo staging with weakly-supervised region selection and dynamically-decoded predictions , 2019, MLHC.

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Julian J. McAuley,et al.  Predicting Embryo Morphokinetics in Videos with Late Fusion Nets & Dynamic Decoders , 2018, ICLR.

[6]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Davi Geiger,et al.  Label free cell-tracking and division detection based on 2D time-lapse images for lineage analysis of early embryo development , 2014, Comput. Biol. Medicine.

[10]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[12]  Alexander M. Rush Torch-Struct: Deep Structured Prediction Library , 2020, ACL.

[13]  Mathieu Salzmann,et al.  Deep Convolutional Neural Networks for Human Embryonic Cell Counting , 2016, ECCV Workshops.

[14]  Oliver J. Sutton,et al.  Introduction to k Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction , 2012 .

[15]  Parvaneh Saeedi,et al.  Cell-Net: Embryonic Cell Counting and Centroid Localization via Residual Incremental Atrous Pyramid and Progressive Upsampling Convolution , 2019, IEEE Access.

[16]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[17]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .