Robust Piano Music Transcription Based on Computer Vision

Recently, automatic music transcription aiming to convert acoustic music signals into symbolic notations attracts increasing attention. In order to deal with the challenges of automatic music transcription based on acoustic information, traditional approaches adopt hough transform to locate the piano keyboard and a weak classifier to detect pressed keys. However, the hough transform and weak classifier show insufficient detection ability in the changing environment. In this paper, we devise a robust visual piano transcription system using semantic segmentation for the piano keyboard detection and a CNN-based classifier to detect the pressed keys, which improves the frame-level transcription results. In addition, in view of lacking public datasets in the field of visual piano transcription, we further propose a new dataset for visual piano transcription. To demonstrate the effectiveness of our system, we evaluate it on both the published dataset and we proposed, and our system significantly outperforms the state-of-the-art approaches.

[1]  Yupeng Gu,et al.  Observing Pianist Accuracy and Form with Computer Vision , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  Benoit Huet,et al.  A multimodal approach to music transcription , 2008, 2008 15th IEEE International Conference on Image Processing.

[3]  Jie Liang,et al.  A real-time system for online learning-based visual transcription of piano music , 2018, Multimedia Tools and Applications.

[4]  Yonghong Yan,et al.  Automatic Piano Music Transcription Using Audio-Visual Features , 2015 .

[5]  Simon Dixon,et al.  An Attack/Decay Model for Piano Transcription , 2016, ISMIR.

[6]  Potcharapol Suteparuk,et al.  Detection of Piano Keys Pressed in Video , 2006 .

[7]  M. J. Anderson,et al.  Multimodal Guitar : Performance Toolbox and Study Workbench , 2009 .

[8]  Richard D. Green,et al.  Key detection for a virtual piano teacher , 2013, 2013 28th International Conference on Image and Vision Computing New Zealand (IVCNZ 2013).

[9]  Howard Cheng,et al.  Real-Time Piano Music Transcription Based on Computer Vision , 2015, IEEE Transactions on Multimedia.

[10]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Boga Vishal,et al.  Paper piano — Shadow analysis based touch interaction , 2017, 2017 2nd International Conference on Man and Machine Interfacing (MAMI).

[12]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[13]  James A. Moorer,et al.  On the Transcription of Musical Sound by Computer , 2016 .

[14]  Zhi Gang Wu,et al.  Automatic Transcription of Piano Music Using Audio-Vision Fusion , 2013 .

[15]  Jaeyoon Kim,et al.  Virtual Piano using Computer Vision , 2019, ArXiv.

[16]  Howard Cheng,et al.  Clavision: visual automatic piano music transcription , 2015, NIME.

[17]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).