CHAPTER From Audio to Music Notation

The field of Music Information Retrieval (MIR) focuses on creating methods and practices for making sense of music data from various modalities, including audio, video, images, scores, and metadata [54]. Within MIR, a core problem which to the day remains open is Automatic Music Transcription (AMT), the process of automatically converting an acoustic music signal into some form of musical notation. The creation of a method for automatically converting musical audio to notation has several uses including but also going beyond MIR: from software for automatic typesetting of audio into staff notation or other music representations, to the use of automatic transcriptions as a descriptor towards the development of systems for music recommendation, to applications for interactive music systems such as automatic music accompaniment, for music education through methods for automatic instrument tutoring, and towards enabling musicological research in sound archives, to name but a few. Interest in AMT has grown during recent years as part of recent advances in artificial intelligence and in particular deep learning, which have led to new applications, systems, as well as have led to a new set of technical, methodological and ethical challenges related to this problem. This chapter presents state-of-the-art research and open topics in AMT, focusing on recent methods for addressing this task based on deep learning, as well as on outlining challenges and directions for future research. The first attempts to address this problem come back to the 1970s and the dawn of the field of computer music (e.g., [47]), while the problem faced a resurgence in the mid-2000s with the development of methods for audio signal processing and pattern recognition, and encountered a second wave of popularity in recent years following the emergence of deep learning methods. Irrespective of the methodologies used to investigate and develop tools and practices for AMT, researchers addressing this task draw knowledge from several disciplines, including digital signal processing,

[1]  Adrien Ycart,et al.  Blending Acoustic and Language Model Predictions for Automatic Music Transcription , 2019, ISMIR.

[2]  Jorge Calvo-Zaragoza,et al.  A Holistic Approach to Polyphonic Music Transcription with Neural Networks , 2019, ISMIR.

[3]  Jonathan Le Roux,et al.  Cutting Music Source Separation Some Slakh: A Dataset to Study the Impact of Training Data Quality and Quantity , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[4]  Juan Pablo Bello,et al.  Adversarial Learning for Improved Onsets and Frames Music Transcription , 2019, ISMIR.

[5]  Andrew McLeod,et al.  Evaluating Non-aligned Musical Score Transcriptions with MV2H , 2019, ArXiv.

[6]  Masataka Goto,et al.  Automatic Singing Transcription Based on Encoder-decoder Recurrent Neural Networks with a Weakly-supervised Attention Mechanism , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Simon Dixon,et al.  Automatic Music Transcription: An Overview , 2019, IEEE Signal Processing Magazine.

[8]  Gerhard Widmer,et al.  Multitask Learning for Polyphonic Piano Transcription, a Case Study , 2019, 2019 International Workshop on Multilayer Music Representation and Processing (MMRP).

[9]  Cheng-Zhi Anna Huang,et al.  Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset , 2018, ICLR.

[10]  Adrien Ycart,et al.  A-MAPS: Augmented MAPS Dataset with Rhythm and Key Annotations , 2018 .

[11]  Li Su,et al.  Learning Domain-Adaptive Latent Representations of Music Signals Using Variational Autoencoders , 2018, ISMIR.

[12]  Juan Pablo Bello,et al.  Multitask Learning for Fundamental Frequency Estimation in Music , 2018, ArXiv.

[13]  Gerhard Widmer,et al.  A Review of Automatic Drum Transcription , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  Eita Nakamura,et al.  Towards Complete Polyphonic Music Transcription: Integrating Multi-Pitch Detection and Rhythm Quantization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Qi Wang,et al.  Polyphonic Piano Transcription with a Note-Based Music Language Model , 2018 .

[16]  Colin Raffel,et al.  Onsets and Frames: Dual-Objective Piano Transcription , 2017, ISMIR.

[17]  Adrien Ycart,et al.  A Study on LSTM Networks for Polyphonic Music Sequence Modelling , 2017, ISMIR.

[18]  Paris Smaragdis,et al.  Towards end-to-end polyphonic music transcription: Transforming music audio directly to a score , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[19]  Mark D. Plumbley,et al.  Computational Analysis of Sound Scenes and Events , 2017 .

[20]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[21]  Brendt Wohlberg,et al.  Piano Transcription With Convolutional Sparse Lateral Inhibition , 2017, IEEE Signal Processing Letters.

[22]  Gerhard Widmer,et al.  An Experimental Analysis of the Entanglement Problem in Neural-Network-based Music Transcription Systems , 2017, Semantic Audio.

[23]  Eita Nakamura,et al.  Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Gerhard Widmer,et al.  On the Potential of Simple Framewise Approaches to Piano Transcription , 2016, ISMIR.

[25]  Zaïd Harchaoui,et al.  Learning Features of Music from Scratch , 2016, ICLR.

[26]  Emilia Gómez,et al.  Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music , 2016 .

[27]  Tillman Weyde,et al.  An Efficient Temporally-Constrained Probabilistic Model for Multiple-Instrument Music Transcription , 2015, ISMIR.

[28]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[29]  Yi-Hsuan Yang,et al.  Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription , 2015, CMMR.

[30]  Dong Yu,et al.  Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[31]  Matthias Mauch,et al.  MedleyDB: A Multitrack Dataset for Annotation-Intensive MIR Research , 2014, ISMIR.

[32]  Daniel P. W. Ellis,et al.  Melody Extraction from Polyphonic Music Signals: Approaches, applications, and challenges , 2014, IEEE Signal Processing Magazine.

[33]  Tijl De Bie,et al.  Automatic Chord Estimation from Audio: A Review of the State of the Art , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[34]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[35]  Emmanouil Benetos,et al.  Automatic Transcription of Turkish Makam Music , 2013, ISMIR.

[36]  Emilia Gómez,et al.  Towards Computer-Assisted Flamenco Transcription: An Experimental Comparison of Automatic Transcription Algorithms as Applied to A Cappella Singing , 2013, Computer Music Journal.

[37]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[38]  Markus Schedl,et al.  Polyphonic piano note transcription with recurrent neural networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Carlos Guedes,et al.  Optical music recognition: state-of-the-art and open issues , 2012, International Journal of Multimedia Information Retrieval.

[40]  Changshui Zhang,et al.  Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[41]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Mert Bay,et al.  Evaluation of Multiple-F0 Estimation and Tracking Systems , 2009, ISMIR.

[43]  Valentin Emiya,et al.  Perceptually-Based Evaluation of the Errors Usually Made When Automatically Transcribing Music , 2008, ISMIR.

[44]  Anssi Klapuri,et al.  Signal Processing Methods for Music Transcription , 2006 .

[45]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[46]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[47]  W M Hartmann,et al.  Pitch, periodicity, and auditory organization. , 1996, The Journal of the Acoustical Society of America.

[48]  Rachel M. Bittner,et al.  Generalized Metrics for Single-f0 Estimation Evaluation , 2019, ISMIR.

[49]  Brian McFee,et al.  OpenMIC-2018: An Open Data-set for Multiple Instrument Recognition , 2018, ISMIR.

[50]  Johan Pauwels,et al.  GuitarSet: A Dataset for Guitar Transcription , 2018, ISMIR.

[51]  Daniel Scharstein,et al.  AUTOMATIC MUSIC TRANSCRIPTION , 2018 .

[52]  Mark Steedman,et al.  Evaluating Automatic Polyphonic Music Transcription , 2018, ISMIR.

[53]  Justin Salamon,et al.  Deep Salience Representations for F0 Estimation in Polyphonic Music , 2017, ISMIR.

[54]  Zhiyao Duan,et al.  A Metric for Music Notation Transcription Accuracy , 2017, ISMIR.

[55]  Colin Raffel,et al.  Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching , 2016 .

[56]  Zhiyao Duan,et al.  Transcribing Human Piano Performances into Music Notation , 2016, ISMIR.

[57]  C H Chen,et al.  Handbook of Pattern Recognition and Computer Vision, 5th Ed , 2016, Handbook of Pattern Recognition and Computer Vision.

[58]  Zhiyao Duan,et al.  Note-level Music Transcription by Maximum Likelihood Sampling , 2014, ISMIR.

[59]  Emilio Molina,et al.  Evaluation Framework for Automatic Singing Transcription , 2014, ISMIR.

[60]  Xavier Serra,et al.  Roadmap for Music Information ReSearch , 2013 .

[61]  Juhan Nam,et al.  A Classification-Based Polyphonic Piano Transcription Approach Using Learned Feature Representations , 2011, ISMIR.

[62]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[63]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).