Detecting human activities in retail surveillance using hierarchical finite state machine

Cashiers in retail stores usually exhibit certain repetitive and periodic activities when processing items. Detecting such activities plays a key role in most retail fraud detection systems. In this paper, we propose a highly efficient, effective and robust vision technique to detect checkout-related primitive activities, based on a hierarchical finite state machine (FSM). Our deterministic approach uses visual features and prior spatial constraints on the hand motion to capture particular motion patterns performed in primitive activities. We also apply our approach to the problem of retail fraud detection. Experimental results on a large set of video data captured from retail stores show that our approach, while much simpler and faster, achieves significantly better results than state-of-the-art machine learning-based techniques both in detecting checkout-related activities and in detecting checkout-related fraudulent incidents.

[1]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[2]  Stan Sclaroff,et al.  A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Luc Van Gool,et al.  Exploiting simple hierarchies for unsupervised human behavior analysis , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Sharath Pankanti,et al.  Recognition of repetitive sequential human activity , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Sharath Pankanti,et al.  Fast detection of retail fraud using polar touch buttons , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[9]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  Richard Bowden,et al.  A boosted classifier tree for hand shape detection , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[12]  Sharath Pankanti,et al.  Detecting sweethearting in retail surveillance videos , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.