Handwritten Music Symbol Classification Using Deep Convolutional Neural Networks

In this paper, we utilize deep Convolutional Neural Networks (CNNs) to classify handwritten music symbols in HOMUS data set. HOMUS data set is made up of various types of strokes which contain time information and it is expected that online techniques are more appropriate for classification. However, experimental results show that CNN which does not use time information achieved classification accuracy around 94.6% which is way higher than 82% of dynamic time warping (DTW), the prior state-of-the-art online technique. Finally, we achieved the best accuracy around 95.6% with the ensemble of CNNs.

[1]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[2]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[3]  James A. Anderson,et al.  Neurocomputing: Foundations of Research , 1988 .

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  José Oncina,et al.  Recognition of Pen-Based Music Notation: The HOMUS Dataset , 2014, 2014 22nd International Conference on Pattern Recognition.

[8]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[9]  Carlos Guedes,et al.  Optical music recognition: state-of-the-art and open issues , 2012, International Journal of Multimedia Information Retrieval.

[10]  Minoru Maruyama,et al.  An online handwritten music symbol recognition system , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[11]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[12]  Haig A. Bosmajian Readings in speech , 1971 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Norbert Link,et al.  Gesture recognition with inertial sensors and optimized DTW prototypes , 2010, 2010 IEEE International Conference on Systems, Man and Cybernetics.

[15]  Susan E. George,et al.  Online Pen-Based Recognition of Music Notation with Artificial Neural Networks , 2003, Computer Music Journal.

[16]  Marcos Faúndez-Zanuy,et al.  On-line signature recognition based on VQ-DTW , 2007, Pattern Recognit..

[18]  Jing Zhang,et al.  Classification of optical music symbols based on combined neural network , 2014, 2014 International Conference on Mechatronics and Control (ICMC).

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[21]  Minoru Maruyama,et al.  A fast HMM algorithm based on stroke lengths for on-line recognition of handwritten music scores , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[22]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[23]  Kian Chin Lee,et al.  Handwritten music notation recognition using HMM — a non-gestural approach , 2010, 2010 International Conference on Information Retrieval & Knowledge Management (CAMP).