Isolated guitar transcription using a deep belief network

Music transcription involves the transformation of an audio recording to common music notation, colloquially referred to as sheet music. Manually transcribing audio recordings is a difficult and time-consuming process, even for experienced musicians. In response, several algorithms have been proposed to automatically analyze and transcribe the notes sounding in an audio recording; however, these algorithms are often general-purpose, attempting to process any number of instruments producing any number of notes sounding simultaneously. This paper presents a polyphonic transcription algorithm that is constrained to processing the audio output of a single instrument, specifically an acoustic guitar. The transcription system consists of a novel note pitch estimation algorithm that uses a deep belief network andmulti-label learning techniques to generate multiple pitch estimates for each analysis frame of the input audio signal. Using a compiled dataset of synthesized guitar recordings for evaluation, the algorithm described in this work results in an 11% increase in the f-measure of note transcriptions relative to Zhou et al.’s (2009) transcription algorithm in the literature. This paper demonstrates the effectiveness of deep, multi-label learning for the task of polyphonic transcription. Subjects Data Mining and Machine Learning, Data Science

[1]  Lei Tang,et al.  Large scale multi-label classification via metalabeler , 2009, WWW '09.

[2]  J. Beauchamp,et al.  Fundamental frequency estimation of musical signals using a two‐way mismatch procedure , 1994 .

[3]  Tillman Weyde,et al.  Explicit Duration Hidden Markov Models for Multiple-Instrument Polyphonic Music Transcription , 2013, ISMIR.

[4]  Tania Lombrozo Computational Modeling of Chord Fingering for String Instruments , 2005 .

[5]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[6]  Keith D. Martin,et al.  A Blackboard System for Automatic Transcription of Simple Polyphonic Music , 1996 .

[7]  Walter D. Potter,et al.  An Evolved Neural Network/HC Hybrid for Tablature Creation in GA-based Guitar Arranging , 2006, ICMC.

[8]  Gregory D. Burlet Guitar Tablature Transcription using a Deep Belief Network , 2015 .

[9]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[10]  Eric Singer,et al.  Proceedings of the 2003 Conference on New Interfaces for Musical Expression (NIME-03), Montreal, Canada LEMUR GuitarBot: MIDI Robotic String Instrument , 2022 .

[11]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[12]  Anssi Klapuri,et al.  Automatic Transcription of Guitar Chords and Fingering From Audio , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[14]  Paul E. Utgoff,et al.  Many-Layered Learning , 2002, Neural Computation.

[15]  Malcolm D. Macleod,et al.  The Automated Music Transcription Problem , 2004 .

[16]  Juhan Nam,et al.  A Classification-Based Polyphonic Piano Transcription Approach Using Learned Feature Representations , 2011, ISMIR.

[17]  M.P. Ryynanen,et al.  Polyphonic music transcription using note event modeling , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[18]  Axel Röbel,et al.  Multiple Fundamental Frequency Estimation and Polyphony Inference of Polyphonic Music Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Anssi Klapuri,et al.  Automatic Music Transcription: Breaking the Glass Ceiling , 2012, ISMIR.

[20]  Jeffrey G. Andrews,et al.  Mode Switching for the Multi-Antenna Broadcast Channel Based on Delay and Channel Quantization , 2008, EURASIP J. Adv. Signal Process..

[21]  Peter F. Driessen,et al.  Path Difference Learning for Guitar Fingering Problem , 2004, ICMC.

[22]  Vincenzo Lombardo,et al.  Computational Model of Chord Fingering , 2005 .

[23]  Nicolas Boulanger-Lewandowski Modeling High-Dimensional Audio Sequences with Recurrent Neural Networks , 2014 .

[24]  Hank Heijink,et al.  On the Complexity of Classical Guitar Playing: Functional Adaptations to Task Constraints , 2002, Journal of motor behavior.

[25]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[26]  Christopher Raphael,et al.  Automatic Transcription of Piano Music , 2002, ISMIR.

[27]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[28]  Daniel P. W. Ellis,et al.  IMPROVING GENERALIZATION FOR POLYPHONIC PIANO TRANSCRIPTION , 2007 .

[29]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[30]  Walter D. Potter,et al.  A Genetic Algorithm for the Automatic Generation of Playable Guitar Tablature , 2005, ICMC.

[31]  Graham E. Poliner,et al.  Improving Generalization for Classification-Based Polyphonic Piano Transcription , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[32]  Giovanni Costantini,et al.  Event based transcription system for polyphonic piano music , 2009, Signal Process..

[33]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[34]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[35]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[36]  Joshua D. Reiss,et al.  A REAL-TIME POLYPHONIC MUSIC TRANSCRIPTION SYSTEM , 2008 .

[37]  Gregory Burlet,et al.  Robotaba Guitar Tablature Transcription Framework , 2013, ISMIR.

[38]  Guillaume Lemaitre,et al.  Real-time Polyphonic Music Transcription with Non-negative Matrix Factorization and Beta-divergence , 2010, ISMIR.

[39]  Yann LeCun,et al.  Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics , 2012, ISMIR.

[40]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[41]  S. Dixon ONSET DETECTION REVISITED , 2006 .

[42]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[43]  Joshua D. Reiss,et al.  A Computationally Efficient Method for Polyphonic Pitch Estimation , 2009, EURASIP J. Adv. Signal Process..

[44]  Yann LeCun,et al.  Feature learning and deep architectures: new directions for music informatics , 2013, Journal of Intelligent Information Systems.

[45]  Juan Pablo Bello,et al.  From music audio to chord tablature: Teaching deep convolutional networks toplay guitar , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[46]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[47]  Matija Marolt,et al.  A connectionist approach to automatic transcription of polyphonic piano music , 2004, IEEE Transactions on Multimedia.

[48]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[49]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[50]  A.P. Klapuri,et al.  A perceptually motivated multiple-F0 estimation method , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[51]  James Anderson Moorer,et al.  On the segmentation and analysis of continuous musical sound by digital computer , 1975 .

[52]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[53]  Anssi Klapuri,et al.  Automatic Music Transcription as We Know it Today , 2004 .