Fast human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification

In this paper, we couple effective dynamic background modeling with deep learning classification to develop a fast and accurate scheme for human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification. Specifically, first, we develop an effective background modeling and subtraction scheme to generate region proposals for the foreground objects. We then develop a cross-frame image patch verification to reduce the number of foreground object proposals. Finally, we perform complexity-accuracy analysis of deep convolutional neural networks (DCNN) to develop a fast deep learning classification scheme to classify these region proposals into three categories: human, animals, and background patches. The optimized DCNN is able to maintain high level of accuracy while reducing the computational complexity by 14 times. Our experimental results demonstrate that the proposed method outperforms existing methods on the camera-trap dataset.