Inception LSTM for Next-frame Video Prediction (Student Abstract)

In this paper, we proposed a novel deep-learning method called Inception LSTM for video frame prediction. A standard convolutional LSTM uses a single size kernel for each of its gates. Having multiple kernel sizes within a single gate would provide a richer features that would otherwise not be possible with a single kernel. Our key idea is to introduce inception like kernels within the LSTM gates to capture features from a bigger area of the image while retaining the fine resolution of small information. We implemented the proposed idea of inception LSTM network on PredNet network with both inception version 1 and inception version 2 modules. The proposed idea was evaluated on both KITTI and KTH data. Our results show that the Inception LSTM has better predictive performance compared to convolutional LSTM. We also observe that LSTM with Inception version 1 has better predictive performance compared to Inception version 2, but Inception version 2 has less computational cost.

[1]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[2]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Matin Hosseini,et al.  Inception-inspired LSTM for Next-frame Video Prediction , 2019, ArXiv.

[4]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.