Siamese Spatio-Temporal Convolutional Neural Network for Stroke Classification in Table Tennis Games

This work presents a Table Tennis stroke classification approach through a siamese spatio-temporal convolutional neural network SSTCNN. The videos are recorded at 120 frames per second with players performing in natural conditions. The frames are extracted, resized and processed to compute the optical flow. From the optical flow, a region of interest ROI is inferred. The SSTCNN is then feed by RGB and optical flow ROIs stream to give a probabilistic classification over all the table tennis strokes.

[1]  Jenny Benois-Pineau,et al.  Fine-Grained Action Detection and Classification in Table Tennis with Siamese Spatio-Temporal Convolutional Neural Network , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[2]  Ferdinand van der Heijden,et al.  Efficient adaptive density estimation per image pixel for the task of background subtraction , 2006, Pattern Recognit. Lett..

[3]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[4]  Jenny Benois-Pineau,et al.  Optimal Choice of Motion Estimation Methods for Fine-Grained Action Classification with 3D Convolutional Networks , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[5]  Boris Mansencal,et al.  Sports Video Annotation: Detection of Strokes in Table Tennis Task for MediaEval 2019 , 2019, MediaEval.

[6]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Jenny Benois-Pineau,et al.  Sport Action Recognition with Siamese Spatio-Temporal CNNs: Application to Table Tennis , 2018, 2018 International Conference on Content-Based Multimedia Indexing (CBMI).

[8]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[9]  Luc Van Gool,et al.  Fast Optical Flow Using Dense Inverse Search , 2016, ECCV.

[10]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.