Evolving a Multi-Classifier System for Multi-Pitch Estimation of Piano Music and Beyond: An Application of Cartesian Genetic Programming

This paper presents a new method with a set of desirable properties for multi-pitch estimation of piano recordings. We propose a framework based on a set of classifiers to analyze audio input and to identify piano notes present in a given audio signal. Our system’s classifiers are evolved using Cartesian genetic programming: we take advantage of Cartesian genetic programming to evolve a set of mathematical functions that act as independent classifiers for piano notes. Two significant improvements are described: the use of a harmonic mask for better fitness values and a data augmentation process for improving the training stage. The proposed approach achieves competitive results using F-measure metrics when compared to state-of-the-art algorithms. Then, we go beyond piano and show how it can be directly applied to other musical instruments, achieving even better results. Our system’s architecture is also described to show the feasibility of its parallelization and its implementation as a real-time system. Our methodology is also a white-box optimization approach that allows for clear analysis of the solutions found and for researchers to learn and test improvements based on the new findings.

[1]  Roland Badeau,et al.  ON AUDIO , SPEECH , AND LANGUAGE PROCESSING 1 Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription , 2013 .

[2]  Simon J. Godsill,et al.  Multiple Pitch Estimation Using Non-Homogeneous Poisson Processes , 2011, IEEE Journal of Selected Topics in Signal Processing.

[3]  Gustavo Reis,et al.  CGP4Matlab - A Cartesian Genetic Programming MATLAB Toolbox for Audio and Image Processing , 2018, EvoApplications.

[4]  Francisco Fernández de Vega,et al.  Automatic Transcription of Polyphonic Piano Music Using Genetic Algorithms, Adaptive Spectral Envelope Modeling, and Dynamic Noise Level Estimation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Yi-Hsuan Yang,et al.  Multipitch Estimation of Piano Music by Exemplar-Based Sparse Representation , 2012, IEEE Transactions on Multimedia.

[6]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Mark B. Sandler,et al.  Automatic Piano Transcription Using Frequency and Time-Domain Information , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  S. R. Mahadeva Prasanna,et al.  Determination of Instants of Significant Excitation in Speech Using Hilbert Envelope and Group Delay Function , 2007, IEEE Signal Processing Letters.

[10]  Weiwei Zhang,et al.  Multi-Pitch Estimation of Polyphonic Music Based on Pseudo Two-Dimensional Spectrum , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Marc Leman,et al.  Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[12]  Masataka Goto,et al.  A Nonparametric Bayesian Multipitch Analyzer Based on Infinite Latent Harmonic Allocation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Hirokazu Kameoka,et al.  Specmurt Analysis of Polyphonic Music Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  James A. Moorer,et al.  On the Transcription of Musical Sound by Computer , 2016 .

[15]  Simon Dixon,et al.  A Shift-Invariant Latent Variable Model for Automatic Music Transcription , 2012, Computer Music Journal.

[16]  Joseph Tabrikian,et al.  Maximum A Posteriori Probability Multiple-Pitch Tracking Using the Harmonic Model , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Daniel Scharstein,et al.  AUTOMATIC MUSIC TRANSCRIPTION , 2018 .

[18]  Anssi Klapuri,et al.  Multiple fundamental frequency estimation based on harmonicity and spectral smoothness , 2003, IEEE Trans. Speech Audio Process..

[19]  Matti Karjalainen,et al.  A computationally efficient multipitch analysis model , 2000, IEEE Trans. Speech Audio Process..

[20]  David Barber,et al.  A generative model for music transcription , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.

[23]  Changshui Zhang,et al.  Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[25]  Matija Marolt,et al.  A connectionist approach to automatic transcription of polyphonic piano music , 2004, IEEE Transactions on Multimedia.

[26]  Andreas Jakobsson,et al.  Multi-Pitch Estimation , 2009, Multi-Pitch Estimation.

[27]  Axel Röbel,et al.  Multiple Fundamental Frequency Estimation and Polyphony Inference of Polyphonic Music Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Julian Francis Miller,et al.  Cartesian genetic programming , 2000, GECCO '10.

[29]  Hirokazu Kameoka,et al.  A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  William F. Punch,et al.  Analysis of Cartesian Genetic Programming’s Evolutionary Mechanisms , 2015, IEEE Transactions on Evolutionary Computation.

[31]  Anssi Klapuri,et al.  Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Jürgen Leitner,et al.  Cartesian Genetic Programming for Image Processing , 2013 .