Convolutional Neural Network - Long Short Term Memory based IOT Node for Violence Detection

Violence detection has been investigated extensively in the literature. Recently, IOT based violence video surveillance is an intelligent component integrated in security system of smart buildings. Violence video detector is a specific kind of detection models that should be highly accurate to increase the model's sensitivity and reduce the false alarm rate. This paper proposes a novel architecture of end-to-end CNN-LSTM (Convolutional Neural Network - Long Short-Term Memory) model that can run on low-cost Internet of Things (IOT) device such as raspberry pi board. The paper utilized CNN to learn spatial features from video's frames that were applied to LSTM for video classification into violence/non-violence classes. A complex dataset including two public datasets: RWF-2000 and RLVS-2000 was used for model training and evaluation. The challenging video content includes crowds and chaos, small object at far distance, low resolution, and transient action. Additionally, the videos were captured in various environments such as street, prison, and schools with several human actions such as eating, playing basketball, football, tennis, and swimming. The experimental results show good performance of the proposed violence detection model in terms of average metrics having an accuracy of 73.35 %, recall of 76.90 %, precision of 72.53 %, F1 score of 74.01 %, false negative rate of 23.10 %, false positive rate of 30.20 %, and AUC of 82.0 %. The proposed CNN-LSTM can balance good performance with low number of parameters and thus can be implemented on low-cost IOT node.