Development of Sound Identification System for Domestic Actions Recognition

Domestic action recognition is crucial for hazard detection in households. Sound identification can be engaged for the understanding of contextual environment information. Human voice or any surroundings noise might give descriptive identification of potential risk appearing nearby. In this paper, the applied sound processing approach is presented, which classifies environmental sounds into 11 domestic activities classes. The described method is based on features extraction from Mel's cepstrum into $224\times 224$ pixels grayscale image and classification using a convolutional neural network. Signal spectrum data is treated as images and processed with 2-dimensional convolutions in shallow neural network architecture. The experimental results have shown the best 92.60% of recognition accuracy for the DASEE database, surpassing 1-dimensional convolutional model-based approaches that take into consideration raw signal data.