Three-layered hierarchical scheme with a Kinect sensor microphone array for audio-based human behavior recognition

Graphical abstractDisplay Omitted Proposed three-layered hierarchical human behavior recognition uses only audio data.The use of Kinect sensor microphone array for data capture and fusion is explored.The proposed scheme increased recognition robustness compared with conventional GMM. This study develops a hierarchical scheme with three processing layers for human behavior recognition. The proposed scheme is an audio-based approach that employs a microphone array of the Kinect sensor for sensing and acquiring acoustic data to classify human behavior. The three processing layers, namely the feature layer, acoustic event classification layer, and specific behavior recognition layer, are interrelated, and the sensing data fusion, Gaussian mixture model with a classification tree, and state machine diagram to regulate human behavior are employed in these three layers, respectively. With enhanced performance of the feature and acoustic event classification layers, the proposed scheme exhibits increased human behavior classification accuracy. Human behavior recognition experiments were conducted in a research office, and three specific office behavior modes, namely "Laboratory meeting," "Classmate chatting," and "Laboratory study interaction," were effectively classified using the proposed method.

[1]  Ing-Jr Ding,et al.  An HMM-Like Dynamic Time Warping Scheme for Automatic Speech Recognition , 2014 .

[2]  Kejun Wang,et al.  Video-Based Abnormal Human Behavior Recognition—A Review , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[3]  Douglas A. Reynolds Gaussian Mixture Models , 2009, Encyclopedia of Biometrics.

[4]  Ing-Jr Ding,et al.  An eigenspace-based method with a user adaptation scheme for human gesture recognition by using Kinect 3D data ☆ , 2015 .

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  B. Parhami Voting algorithms , 1994 .

[7]  Nicu Sebe,et al.  Challenges of Human Behavior Understanding , 2010, HBU.

[8]  Ivan Tashev Kinect Development Kit: A Toolkit for Gesture- and Speech-Based Human-Machine Interaction [Best of the Web] , 2013, IEEE Signal Processing Magazine.

[9]  Daniel Brand,et al.  On Communicating Finite-State Machines , 1983, JACM.

[10]  Zhengyou Zhang,et al.  Microsoft Kinect Sensor and Its Effect , 2012, IEEE Multim..

[11]  Pietro Perona,et al.  Social behavior recognition in continuous video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[13]  Zhou Zimu,et al.  RSSIからCSIへ:チャネルレスポンスによるインドア・ローカリゼーション , 2013 .

[14]  Youssef Chahir,et al.  Unified framework for human behaviour recognition: An approach using 3D Zernike moments , 2013, Neurocomputing.

[15]  Jenq-Neng Hwang,et al.  A Review on Video-Based Human Activity Recognition , 2013, Comput..

[16]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[17]  Michael Brady,et al.  Automatic Human Behaviour Recognition and Explanation for CCTV Video Surveillance , 2008 .

[18]  Manuel P. Cuéllar,et al.  A survey on ontologies for human behavior recognition , 2014, ACM Comput. Surv..

[19]  Tsun S. Chow,et al.  Testing Software Design Modeled by Finite-State Machines , 1978, IEEE Transactions on Software Engineering.

[20]  James M. Keller,et al.  Modeling Human Activity From Voxel Person Using Fuzzy Logic , 2009, IEEE Transactions on Fuzzy Systems.