Deep Learning-Based Violin Bowing Action Recognition

We propose a violin bowing action recognition system that can accurately recognize distinct bowing actions in classical violin performance. This system can recognize bowing actions by analyzing signals from a depth camera and from inertial sensors that are worn by a violinist. The contribution of this study is threefold: (1) a dataset comprising violin bowing actions was constructed from data captured by a depth camera and multiple inertial sensors; (2) data augmentation was achieved for depth-frame data through rotation in three-dimensional world coordinates and for inertial sensing data through yaw, pitch, and roll angle transformations; and, (3) bowing action classifiers were trained using different modalities, to compensate for the strengths and weaknesses of each modality, based on deep learning methods with a decision-level fusion process. In experiments, large external motions and subtle local motions produced from violin bow manipulations were both accurately recognized by the proposed system (average accuracy > 80%).

[1]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[2]  Rafael Ramírez,et al.  Air violin: a machine learning approach to fingering gesture recognition , 2017, MIE@ICMI.

[3]  Lihi Zelnik-Manor,et al.  Statistical analysis of dynamic actions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[5]  Zafar Ali Khan,et al.  Abnormal human activity recognition system based on R-transform and kernel discriminant technique for elderly home care , 2011, IEEE Transactions on Consumer Electronics.

[6]  Rafael Ramírez,et al.  Bowing Gestures Classification in Violin Performance: A Machine Learning Approach , 2019, Front. Psychol..

[7]  Qian Du,et al.  Local Binary Patterns and Extreme Learning Machine for Hyperspectral Imagery Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Pushmeet Kohli,et al.  Fusion4D , 2016, ACM Trans. Graph..

[9]  Kai-Lung Hua,et al.  Baseball Player Behavior Classification System Using Long Short-Term Memory with Multimodal Features , 2019, Sensors.

[10]  James E. Fowler,et al.  Decision Fusion in Kernel-Induced Spaces for Hyperspectral Image Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Billur Barshan,et al.  Activity Recognition Invariant to Sensor Orientation with Wearable Motion Sensors , 2017, Sensors.

[12]  Nasser Kehtarnavaz,et al.  A Real-Time Human Action Recognition System Using Depth and Inertial Sensor Fusion , 2016, IEEE Sensors Journal.

[13]  Nasser Kehtarnavaz,et al.  Data Augmentation in Deep Learning-Based Fusion of Depth and Inertial Sensing for Action Recognition , 2019, IEEE Sensors Letters.

[14]  Jing Zhang,et al.  Action Recognition From Depth Maps Using Deep Convolutional Neural Networks , 2016, IEEE Transactions on Human-Machine Systems.

[15]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Renqiang Xie,et al.  Accelerometer-Based Hand Gesture Recognition by Neural Network and Similarity Matching , 2016, IEEE Sensors Journal.

[18]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[19]  Ping Luo,et al.  Towards Understanding Regularization in Batch Normalization , 2018, ICLR.

[20]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tanima Dutta,et al.  A Continuous Hand Gestures Recognition Technique for Human-Machine Interaction Using Accelerometer and Gyroscope Sensors , 2016, IEEE Sensors Journal.