论文信息 - Observing Pianist Accuracy and Form with Computer Vision

Observing Pianist Accuracy and Form with Computer Vision

We present a first step towards developing an interactive piano tutoring system that can observe a student playing the piano and give feedback about hand movements and musical accuracy. In particular, we have two primary aims: 1) to determine which notes on a piano are being played at any moment in time, 2) to identify which finger is pressing each note. We introduce a novel two-stream convolutional neural network that takes video and audio inputs together for detecting pressed notes and finger presses. We formulate our two problems in terms of multi-task learning and extend a state-of-the-art object detection model to incorporate both audio and visual features. In addition, we introduce a novel finger identification solution based on pressed piano note information. We experimentally confirm that our approach is able to detect pressed piano keys and the piano player's fingers with a high accuracy.

[1] Colin Raffel,et al. Onsets and Frames: Dual-Objective Piano Transcription , 2017, ISMIR.

[2] George Tzanetakis,et al. Effective use of multimedia for computer-assisted musical instrument tutoring , 2007, Emme '07.

[3] P. Mermelstein,et al. Distance measures for speech recognition, psychological and instrumental , 1976 .

[4] M. Hunt,et al. Distance measures for speech recognition , 1989 .

[5] George Tzanetakis,et al. Detecting Pianist Hand Posture Mistakes for Virtual Piano Tutoring , 2016, International Conference on Mathematics and Computing.

[6] Simon Holland,et al. Artificial Intelligence in Music Education: A Critical Review , 2000, Readings in Music and Artificial Intelligence.

[7] Tsutomu Terada,et al. Design and Implementation of a Real-Time Fingering Detection System for Piano Performance , 2006, ICMC.

[8] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[9] Simon Dixon,et al. An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10] György Fazekas,et al. Music recommendation for music learning: Hotttabs, a multimedia guitar tutor , 2011 .

[11] Marcelo M. Wanderley,et al. Estimation of Guitar Fingering and Plucking Controls Based on Multimodal Analysis of Motion, Audio and Musical Score , 2015, CMMR.

[12] Robert Joseph,et al. A computer‐based multi‐media tutor for beginning piano students , 1990 .

[13] Guillaume Lemaitre,et al. Real-time Polyphonic Music Transcription with Non-negative Matrix Factorization and Beta-divergence , 2010, ISMIR.

[14] Ira Kemelmacher-Shlizerman,et al. Audio to Body Dynamics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Stefan Lee,et al. Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[17] Xavier Serra,et al. Multi-Label Music Genre Classification from Audio, Text and Images Using Deep Features , 2017, ISMIR.

[18] Peter Knees,et al. Drum Transcription via Joint Beat and Drum Modeling Using Convolutional Recurrent Neural Networks , 2017, ISMIR.

[19] Charles Louis Hanon,et al. The virtuoso pianist : in sixty excercises for the piano : for the acquirement of agility, independence, strength, and perfect evenness in the fingers, as well as suppleness of the wrist , .

[20] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[21] Jakob Abeßer,et al. Music Information Retrieval Meets Music Education , 2012, Multimodal Music Processing.

[22] Alexander Lerch,et al. Chord Detection Using Deep Learning , 2015, ISMIR.

[23] Yaser Sheikh,et al. Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Bochen Li,et al. Skeleton Plays Piano: Online Generation of Pianist Body Movements from MIDI Performance , 2018, ISMIR.

[25] Howard Cheng,et al. Real-Time Piano Music Transcription Based on Computer Vision , 2015, IEEE Transactions on Multimedia.

[26] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Anssi Klapuri,et al. Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[28] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[29] Bingjun Zhang,et al. Automatic Music Transcription using Audio-Visual Fusion for Violin Practice in Home Environment , 2009 .

[30] David Hsu,et al. Digital violin tutor: an integrated system for beginning violin learners , 2005, ACM Multimedia.

[31] Daniel Gärtner,et al. Real-Time Transcription and Separation of Drum Recordings Based on NMF Decomposition , 2014, DAFx.

[32] Daniel P. W. Ellis,et al. Content-Aware Collaborative Music Recommendation Using Pre-trained Neural Networks , 2015, ISMIR.

[33] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.