Clavision: visual automatic piano music transcription

One important problem in Musical Information Retrieval is Automatic Music Transcription, which is an automated conversion process from played music to a symbolic notation such as sheet music. Since the accuracy of previous audio-based transcription systems is not satisfactory, we propose an innovative visual-based automatic music transcription system named claVision to perform piano music transcription. Instead of processing the music audio, the system performs the transcription only from the video performance captured by a camera mounted over the piano keyboard. claVision can be used as a transcription tool, but it also has other applications such as music education. The claVision software has a very high accuracy (over 95%) and a very low latency in real-time music transcription, even under different illumination conditions.

[1]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[2]  Alexandra Branzan Albu,et al.  The visual keyboard: Real-time feet tracking for the control of musical meta-instruments , 2008, Signal Process. Image Commun..

[3]  Paul S. Heckbert,et al.  Projective Mappings for Image Warping , 1995 .

[4]  Simon J. Godsill,et al.  Multiple Pitch Estimation Using Non-Homogeneous Poisson Processes , 2011, IEEE Journal of Selected Topics in Signal Processing.

[5]  James A. Moorer,et al.  On the Transcription of Musical Sound by Computer , 2016 .

[6]  Marcelo M. Wanderley,et al.  Visual Methods for the Retrieval of Guitarist Fingering , 2006, NIME.

[7]  Edward B. Birge,et al.  The Understanding of Music , 1945 .

[8]  Christian Frisson,et al.  Multimodal Guitar: A Toolbox For Augmented Guitar Performances , 2010, NIME.

[9]  Bingjun Zhang,et al.  Visual analysis of fingering for pedagogical violin transcription , 2007, ACM Multimedia.

[10]  Bracha Shapira,et al.  Recommender Systems Handbook , 2015, Springer US.

[11]  Jan Richter Garbage Collection: Automatic Memory Management in the Microsoft , 2000 .

[12]  Hideo Saito,et al.  Support system for guitar playing using augmented reality display , 2006, 2006 IEEE/ACM International Symposium on Mixed and Augmented Reality.

[13]  David Padua,et al.  Encyclopedia of Parallel Computing , 2011 .

[14]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[15]  Dmitry O. Gorodnichy,et al.  Detection and tracking of pianist hands and fingers , 2006, The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).

[16]  M. J. Anderson,et al.  Multimodal Guitar : Performance Toolbox and Study Workbench , 2009 .

[17]  Benoit Huet,et al.  A multimodal approach to music transcription , 2008, 2008 15th IEEE International Conference on Image Processing.

[18]  Manabu Hashimoto,et al.  Marker-less piano fingering recognition using sequential depth images , 2013, The 19th Korea-Japan Joint Workshop on Frontiers of Computer Vision.

[19]  Nick Collins A Comparison of Sound Onset Detection Algorithms with Emphasis on Psychoacoustically Motivated Detection Functions , 2005 .

[20]  Anssi Klapuri,et al.  Automatic Music Transcription as We Know it Today , 2004 .

[21]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[22]  Kia Ng,et al.  Polyphonic note tracking using Multimodal Retrieval of Musical Events , 2009, ICMC.

[23]  David Miles Huber,et al.  The MIDI Manual , 1991 .

[24]  Rafael C. González,et al.  Digital image processing, 3rd Edition , 2008 .

[25]  M. Ben,et al.  Principles of concurrent and distributed programming, second edition , 2006 .

[26]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[27]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[28]  Alicja Wieczorkowska,et al.  Music Information Retrieval , 2009, Encyclopedia of Data Warehousing and Mining.

[29]  George Tzanetakis,et al.  Automatic Musical Genre Classification of Audio Signals , 2001, ISMIR.

[30]  Potcharapol Suteparuk,et al.  Detection of Piano Keys Pressed in Video , 2006 .

[31]  Norman H. Crowhurst Electronic Musical Instruments , 1975 .

[32]  Karin Dressler MULTIPLE FUNDAMENTAL FREQUENCY EXTRACTION FOR MIREX 2012 , 2011 .

[33]  Hanan Samet,et al.  Efficient Component Labeling of Images of Arbitrary Dimension Represented by Linear Bintrees , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[35]  Robert C. Maher,et al.  Evaluation of a method for separating digitized duet signals , 1990 .

[36]  Martin Piszczalski,et al.  Automatic Music Transcription , 2016 .

[37]  Manitsaris Sotirios,et al.  Computer vision method for pianist's fingers information retrieval , 2008, iiWAS.

[38]  Yangsheng Xu,et al.  Hidden Markov Model for Gesture Recognition , 1994 .

[39]  Abraham Silberschatz,et al.  Operating System Concepts , 1983 .

[40]  Bruce Jacob,et al.  Algorithmic composition as a model of creativity , 1996, Organised Sound.

[41]  Mark Lindley,et al.  The New Grove Dictionary of Music and Musicians , 2001 .

[42]  Mordechai Ben-Ari,et al.  Principles of concurrent and distributed programming , 2005, PHI Series in computer science.

[43]  Richard Green,et al.  Retrieval of guitarist fingering information using computer vision , 2010, 2010 25th International Conference of Image and Vision Computing New Zealand.