Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching

Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching

[1]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[2]  A. Bhattacharyya On a measure of divergence between two statistical populations defined by their probability distributions , 1943 .

[3]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[4]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[5]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[6]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[7]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[8]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[9]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[10]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[11]  F. Richard Moore,et al.  The Dysfunctions of MIDI , 1988, ICMC.

[12]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[13]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[14]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[15]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[16]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[17]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[18]  Judith C. Brown Calculation of a constant Q spectral transform , 1991 .

[19]  Judith C. Brown,et al.  An efficient algorithm for the calculation of a constant Q transform , 1992 .

[20]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[21]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  Perry R. Cook,et al.  Music, cognition, and computerized sound: an introduction to psychoacoustics , 1999 .

[24]  Takuya Fujishima,et al.  Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[25]  Chi Lap Yip,et al.  Selection of melody lines for music databases , 2000, Proceedings 24th Annual International Computer Software and Applications Conference. COMPSAC2000.

[26]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[27]  Michael Good,et al.  MusicXML for notation and analysis , 2001 .

[28]  Simon Dixon,et al.  Automatic Extraction of Tempo and Beat From Expressive Performances , 2001 .

[29]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[30]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[31]  George Tzanetakis,et al.  Polyphonic audio matching and alignment for music retrieval , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[32]  Daniel P. W. Ellis,et al.  Ground-truth transcriptions of real music from force-aligned MIDI syntheses , 2003, ISMIR.

[33]  Marina Bosi,et al.  Introduction to Digital Audio Coding and Standards , 2004, J. Electronic Imaging.

[34]  Eamonn J. Keogh,et al.  Everything you know about Dynamic Time Warping is Wrong , 2004 .

[35]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[36]  Daniel P. W. Ellis,et al.  A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[37]  Gerhard Widmer,et al.  MATCH: A Music Alignment Tool Chest , 2005, ISMIR.

[38]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[39]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[40]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[41]  Ichiro Fujinaga,et al.  jSymbolic: A Feature Extractor for MIDI Files , 2006, ICMC.

[42]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[43]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[44]  Mark Sandler,et al.  Signal Processing Parameters for Tonality Estimation , 2007 .

[45]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[46]  Smith,et al.  Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications , 2007 .

[47]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[48]  Gert R. G. Lanckriet,et al.  Towards musical query-by-semantic-description using the CAL500 data set , 2007, SIGIR.

[49]  Daniel P. W. Ellis,et al.  A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[50]  Daniel Müllensiefen,et al.  Bayesian Model Selection for Harmonic Labelling , 2007 .

[51]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[52]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[53]  David Rizo,et al.  Mining Digital Music Score Collections: Melody Extraction and Genre Recognition , 2008 .

[54]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008, Acoustical Science and Technology.

[55]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[56]  Stephen Cranefield,et al.  A Study on Feature Analysis for Musical Instrument Classification , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[57]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[58]  Orberto,et al.  Evaluation Methods for Musical Audio Beat Tracking Algorithms , 2009 .

[59]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[60]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[61]  Pascal Vincent,et al.  The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[62]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[63]  Christian Schörkhuber CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[64]  Youngmoo E. Kim,et al.  Exploring automatic music annotation with "acoustically-objective" tags , 2010, MIR '10.

[65]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[66]  Thierry Bertin-Mahieux,et al.  Clustering Beat-Chroma Patterns in a Large Music Database , 2010, ISMIR.

[67]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[68]  Luca Maria Gambardella,et al.  Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[69]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[70]  Christopher Ariza,et al.  Music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data , 2010, ISMIR.

[71]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[72]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[73]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[74]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[75]  Dimitrios Gunopulos,et al.  Embedding-based subsequence matching in time-series databases , 2011, TODS.

[76]  Christopher Ariza,et al.  Feature Extraction and Machine Learning on Symbolic Music using the music21 Toolkit , 2011, ISMIR.

[77]  Simon Dixon,et al.  A Corpus-based Study of Rhythm Patterns , 2012, ISMIR.

[78]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[79]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[80]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[81]  Andreas Rauber,et al.  Facilitating Comprehensive Benchmarking Experiments on the Million Song Dataset , 2012, ISMIR.

[82]  Herbert Jaeger,et al.  Long Short-Term Memory in Echo State Networks: Details of a Simulation Study , 2012 .

[83]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[84]  Marián Boguñá,et al.  Measuring the Evolution of Contemporary Western Popular Music , 2012, Scientific Reports.

[85]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[86]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[87]  Gerald Penn,et al.  Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[88]  Juan Pablo Bello,et al.  Rethinking Automatic Chord Recognition with Convolutional Neural Networks , 2012, 2012 11th International Conference on Machine Learning and Applications.

[89]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[90]  Meinard Müller,et al.  Towards Cross-Version Harmonic Analysis of Music , 2012, IEEE Transactions on Multimedia.

[91]  Thierry Bertin-Mahieux,et al.  Large-Scale Cover Song Recognition Using the 2D Fourier Transform Magnitude , 2012, ISMIR.

[92]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Gerhard Widmer,et al.  Automatic Alignment of Music Performances with Structural Differences , 2013, ISMIR.

[94]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[95]  S. Dixon,et al.  MIREX 2019: VAMP PLUGINS FROM THE CENTRE FOR DIGITAL MUSIC , 2013 .

[96]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[97]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[98]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[99]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[100]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[101]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[102]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[103]  Georg Heigold,et al.  Word embeddings for speech recognition , 2014, INTERSPEECH.

[104]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[105]  Jürgen Schmidhuber,et al.  Multimodal Similarity-Preserving Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[107]  Thomas Grill,et al.  Boundary Detection in Music Structure Analysis using Convolutional Neural Networks , 2014, ISMIR.

[108]  Florian Krebs,et al.  A Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles , 2014, ISMIR.

[109]  Simon Dixon,et al.  Sequential Complexity as a Descriptor for Musical Similarity , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[110]  Daniel P. W. Ellis,et al.  MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.

[111]  Tom Schaul,et al.  Unit Tests for Stochastic Optimization , 2013, ICLR.

[112]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[113]  Mark D. Plumbley,et al.  Score-Informed Source Separation for Musical Audio Recordings: An overview , 2014, IEEE Signal Processing Magazine.

[114]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[115]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[116]  Sebastian Böck,et al.  Improved musical onset detection with Convolutional Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[117]  Harm de Vries,et al.  RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .

[118]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[119]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[120]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[121]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[122]  Yoshua Bengio,et al.  Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.

[123]  Chu-Song Chen,et al.  Supervised Learning of Semantics-Preserving Hashing via Deep Neural Networks for Large-Scale Image Search , 2015, ArXiv.

[124]  Colin Raffel,et al.  librosa: v0.4.0 , 2015 .

[125]  Hendrik Schreiber,et al.  Improving Genre Annotations for the Million Song Dataset , 2015, ISMIR.

[126]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[127]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[128]  Alexander Mordvintsev,et al.  Inceptionism: Going Deeper into Neural Networks , 2015 .

[129]  Juan Pablo Bello,et al.  A Software Framework for Musical Data Augmentation , 2015, ISMIR.

[130]  Daniel P. W. Ellis,et al.  Large-Scale Content-Based Matching of MIDI and Audio Files , 2015, ISMIR.

[131]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Music Transcription , 2015, ArXiv.

[132]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[133]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[134]  Quoc V. Le,et al.  Listen, Attend and Spell , 2015, ArXiv.

[135]  Thomas Grill,et al.  Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks , 2015, ISMIR.

[136]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[137]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[138]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[139]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[140]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[141]  Daniel P. W. Ellis,et al.  Optimizing DTW-based audio-to-MIDI alignment and matching , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[142]  Zhuo Chen,et al.  Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[143]  Karen Livescu,et al.  Deep convolutional acoustic word embeddings using word-pair side information , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[144]  Daniel P. W. Ellis,et al.  Pruning subsequence search with attention-based embedding , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[145]  Colin Raffel Accelerating Multimodal Sequence Retrieval with Convolutional Networks , 2016 .

[146]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[147]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[148]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[149]  Daniel P. W. Ellis,et al.  Extracting Ground-Truth Information from MIDI Files: A MIDIfesto , 2016, ISMIR.

[150]  Francesco Visin,et al.  A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[151]  Charu C. Aggarwal,et al.  Neural Networks and Deep Learning , 2018, Springer International Publishing.