Fall detection in multi-camera surveillance videos: experimentations and observations

This paper presents our study on fall detection for ageing care monitoring. We collected a choreographed multi-camera dataset that contains fall actions and other actions such as walking, standing up, sitting down and so forth. In our work, MoSIFT feature is extracted from the videos recorded by each camera. We conduct a series of experiments to show the performance variations of fall detection when different methods are used. We first compare the performance of the standard Bag-of-Words and spatial Bag-of-Words with different codebook sizes. Then, we test different fusion methods which combines the information from the videos recorded by two orthogonally deployed cameras, where a non-linear χ2 kernel Support Vector Machine (SVM) is trained to detect fall actions. In addition, we also use explicit feature maps along with linear kernel for fall detection and compare it to the standard bag of word representation with a non-linear χ2 kernel. Our experiment results show that late fusion of Bag-of-Words with a 1000 centers codebook obtains the best performance. The best result reaches 90.46% in average precision, which in turn may provide a more independent and safer living environment for the elderly.

[1]  Alan K. Bourke,et al.  Fall-detection through vertical velocity thresholding using a tri-axial accelerometer characterized using an optical motion-capture system , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[2]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  L. McInnes,et al.  Clinical practice guideline for the assessment and prevention of falls in older people. , 2005, Worldviews on evidence-based nursing.

[4]  Richard M. Stern,et al.  Informedia e-lamp @ TRECVID 2012 multimedia event detection and recounting MED and MER , 2012 .

[5]  Ilias Maglogiannis,et al.  Patient Fall Detection using Support Vector Machines , 2007, AIAI.

[6]  S. Allen,et al.  From the Australian Commission on Safety and Quality in Health Care , 2011, The Medical journal of Australia.

[7]  Yi Yang,et al.  Learning to predict health status of geriatric patients from observational data , 2012, 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[8]  Zi Huang,et al.  Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis , 2013, IEEE Transactions on Multimedia.

[9]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[11]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[12]  Jean Meunier,et al.  Fall Detection from Human Shape and Motion History Using Video Surveillance , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[13]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[14]  John Shawe-Taylor,et al.  Two view learning: SVM-2K, Theory and Practice , 2005, NIPS.

[15]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Allen R. Hanson,et al.  Aging in place: fall detection and localization in a distributed smart camera network , 2007, ACM Multimedia.

[17]  Chia-Chi Wang,et al.  Development of a Fall Detecting System for the Elderly Residents , 2008, 2008 2nd International Conference on Bioinformatics and Biomedical Engineering.

[18]  Mubarak Shah,et al.  Recognizing human actions using multiple features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Rita Cucchiara,et al.  A multi‐camera vision system for fall detection and alarm generation , 2007, Expert Syst. J. Knowl. Eng..

[20]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Yi Yang,et al.  Action recognition by exploring data distribution and feature correlation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Silvio Savarese,et al.  Cross-view action recognition via view knowledge transfer , 2011, CVPR 2011.

[23]  Nicolas Thome,et al.  A Real-Time, Multiview Fall Detection System: A LHMM-Based Approach , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[25]  Wei Liu,et al.  Double Fusion for Multimedia Event Detection , 2012, MMM.

[26]  Chia-Wen Lin,et al.  Compressed-domain Fall Incident Detection for Intelligent Homecare , 2007, J. VLSI Signal Process..

[27]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[29]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[30]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[31]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[32]  Ivan Laptev,et al.  Local Descriptors for Spatio-temporal Recognition , 2004, SCVMA.

[33]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.