论文信息 - Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching - 字舞流文

Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching

Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching

Colin Raffel | Colin Raffel

[1] M. Kendall. A NEW MEASURE OF RANK CORRELATION , 1938 .

[2] A. Bhattacharyya. On a measure of divergence between two statistical populations defined by their probability distributions , 1943 .

[3] J. Knott. The organization of behavior: A neuropsychological theory , 1951 .

[4] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[5] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .

[6] J. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[7] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[8] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[9] J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[10] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[11] F. Richard Moore,et al. The Dysfunctions of MIDI , 1988, ICMC.

[12] Lorien Y. Pratt,et al. Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[13] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[14] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[15] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[16] W S McCulloch,et al. A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[17] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[18] Judith C. Brown. Calculation of a constant Q spectral transform , 1991 .

[19] Judith C. Brown,et al. An efficient algorithm for the calculation of a constant Q transform , 1992 .

[20] Donald J. Berndt,et al. Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[21] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[22] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[23] Perry R. Cook,et al. Music, cognition, and computerized sound: an introduction to psychoacoustics , 1999 .

[24] Takuya Fujishima,et al. Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[25] Chi Lap Yip,et al. Selection of melody lines for music databases , 2000, Proceedings 24th Annual International Computer Software and Applications Conference. COMPSAC2000.

[26] Eric Jones,et al. SciPy: Open Source Scientific Tools for Python , 2001 .

[27] Michael Good,et al. MusicXML for notation and analysis , 2001 .

[28] Simon Dixon,et al. Automatic Extraction of Tempo and Beat From Expressive Performances , 2001 .

[29] Eamonn J. Keogh,et al. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[30] George Tzanetakis,et al. Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[31] George Tzanetakis,et al. Polyphonic audio matching and alignment for music retrieval , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[32] Daniel P. W. Ellis,et al. Ground-truth transcriptions of real music from force-aligned MIDI syntheses , 2003, ISMIR.

[33] Marina Bosi,et al. Introduction to Digital Audio Coding and Standards , 2004, J. Electronic Imaging.

[34] Eamonn J. Keogh,et al. Everything you know about Dynamic Time Warping is Wrong , 2004 .

[35] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[36] Daniel P. W. Ellis,et al. A Large-Scale Evaluation of Acoustic and Subjective Music-Similarity Measures , 2004, Computer Music Journal.

[37] Gerhard Widmer,et al. MATCH: A Music Alignment Tool Chest , 2005, ISMIR.

[38] Daniel P. W. Ellis,et al. Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[39] Mark B. Sandler,et al. A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[40] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[41] Ichiro Fujinaga,et al. jSymbolic: A Feature Extractor for MIDI Files , 2006, ICMC.

[42] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[43] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[44] Mark Sandler,et al. Signal Processing Parameters for Tonality Estimation , 2007 .

[45] Philip Chan,et al. Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[46] Smith,et al. Mathematics of the Discrete Fourier Transform (DFT) with Audio Applications , 2007 .

[47] Thomas Hofmann,et al. Greedy Layer-Wise Training of Deep Networks , 2007 .

[48] Gert R. G. Lanckriet,et al. Towards musical query-by-semantic-description using the CAL500 data set , 2007, SIGIR.

[49] Daniel P. W. Ellis,et al. A Discriminative Model for Polyphonic Piano Transcription , 2007, EURASIP J. Adv. Signal Process..

[50] Daniel Müllensiefen,et al. Bayesian Model Selection for Harmonic Labelling , 2007 .

[51] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[52] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[53] David Rizo,et al. Mining Digital Music Score Collections: Melody Extraction and Genre Recognition , 2008 .

[54] J. Stephen Downie,et al. The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008, Acoustical Science and Technology.

[55] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[56] Stephen Cranefield,et al. A Study on Feature Analysis for Musical Instrument Classification , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[57] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[58] Orberto,et al. Evaluation Methods for Musical Audio Beat Tracking Algorithms , 2009 .

[59] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[60] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[61] Pascal Vincent,et al. The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[62] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[63] Christian Schörkhuber. CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .

[64] Youngmoo E. Kim,et al. Exploring automatic music annotation with "acoustically-objective" tags , 2010, MIR '10.

[65] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[66] Thierry Bertin-Mahieux,et al. Clustering Beat-Chroma Patterns in a Large Music Database , 2010, ISMIR.

[67] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[68] Luca Maria Gambardella,et al. Deep, Big, Simple Neural Nets for Handwritten Digit Recognition , 2010, Neural Computation.

[69] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[70] Christopher Ariza,et al. Music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data , 2010, ISMIR.

[71] Ilya Sutskever,et al. Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[72] Thierry Bertin-Mahieux,et al. The Million Song Dataset , 2011, ISMIR.

[73] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[74] Yoshua Bengio,et al. On the Expressive Power of Deep Architectures , 2011, ALT.

[75] Dimitrios Gunopulos,et al. Embedding-based subsequence matching in time-series databases , 2011, TODS.

[76] Christopher Ariza,et al. Feature Extraction and Machine Learning on Symbolic Music using the music21 Toolkit , 2011, ISMIR.

[77] Simon Dixon,et al. A Corpus-based Study of Rhythm Patterns , 2012, ISMIR.

[78] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[79] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[80] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[81] Andreas Rauber,et al. Facilitating Comprehensive Benchmarking Experiments on the Million Song Dataset , 2012, ISMIR.

[82] Herbert Jaeger,et al. Long Short-Term Memory in Echo State Networks: Details of a Simulation Study , 2012 .

[83] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[84] Marián Boguñá,et al. Measuring the Evolution of Contemporary Western Popular Music , 2012, Scientific Reports.

[85] Eamonn J. Keogh,et al. Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[86] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[87] Gerald Penn,et al. Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[88] Juan Pablo Bello,et al. Rethinking Automatic Chord Recognition with Convolutional Neural Networks , 2012, 2012 11th International Conference on Machine Learning and Applications.

[89] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[90] Meinard Müller,et al. Towards Cross-Version Harmonic Analysis of Music , 2012, IEEE Transactions on Multimedia.

[91] Thierry Bertin-Mahieux,et al. Large-Scale Cover Song Recognition Using the 2D Fourier Transform Magnitude , 2012, ISMIR.

[92] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93] Gerhard Widmer,et al. Automatic Alignment of Music Performances with Structural Differences , 2013, ISMIR.

[94] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[95] S. Dixon,et al. MIREX 2019: VAMP PLUGINS FROM THE CENTRE FOR DIGITAL MUSIC , 2013 .

[96] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[97] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[98] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[99] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[100] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.

[101] Sida I. Wang,et al. Dropout Training as Adaptive Regularization , 2013, NIPS.

[102] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[103] Georg Heigold,et al. Word embeddings for speech recognition , 2014, INTERSPEECH.

[104] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[105] Jürgen Schmidhuber,et al. Multimodal Similarity-Preserving Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[107] Thomas Grill,et al. Boundary Detection in Music Structure Analysis using Convolutional Neural Networks , 2014, ISMIR.

[108] Florian Krebs,et al. A Multi-model Approach to Beat Tracking Considering Heterogeneous Music Styles , 2014, ISMIR.

[109] Simon Dixon,et al. Sequential Complexity as a Descriptor for Musical Similarity , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[110] Daniel P. W. Ellis,et al. MIR_EVAL: A Transparent Implementation of Common MIR Metrics , 2014, ISMIR.

[111] Tom Schaul,et al. Unit Tests for Stochastic Optimization , 2013, ICLR.

[112] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[113] Mark D. Plumbley,et al. Score-Informed Source Separation for Musical Audio Recordings: An overview , 2014, IEEE Signal Processing Magazine.

[114] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[115] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[116] Sebastian Böck,et al. Improved musical onset detection with Convolutional Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[117] Harm de Vries,et al. RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .

[118] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[119] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[120] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.

[121] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[122] Yoshua Bengio,et al. Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.

[123] Chu-Song Chen,et al. Supervised Learning of Semantics-Preserving Hashing via Deep Neural Networks for Large-Scale Image Search , 2015, ArXiv.

[124] Colin Raffel,et al. librosa: v0.4.0 , 2015 .

[125] Hendrik Schreiber,et al. Improving Genre Annotations for the Million Song Dataset , 2015, ISMIR.

[126] Marc'Aurelio Ranzato,et al. Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[127] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[128] Alexander Mordvintsev,et al. Inceptionism: Going Deeper into Neural Networks , 2015 .

[129] Juan Pablo Bello,et al. A Software Framework for Musical Data Augmentation , 2015, ISMIR.

[130] Daniel P. W. Ellis,et al. Large-Scale Content-Based Matching of MIDI and Audio Files , 2015, ISMIR.

[131] Simon Dixon,et al. An End-to-End Neural Network for Polyphonic Music Transcription , 2015, ArXiv.

[132] Geoffrey E. Hinton,et al. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[133] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[134] Quoc V. Le,et al. Listen, Attend and Spell , 2015, ArXiv.

[135] Thomas Grill,et al. Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks , 2015, ISMIR.

[136] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[137] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[138] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[139] Xiang Zhang,et al. Text Understanding from Scratch , 2015, ArXiv.

[140] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[141] Daniel P. W. Ellis,et al. Optimizing DTW-based audio-to-MIDI alignment and matching , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[142] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[143] Karen Livescu,et al. Deep convolutional acoustic word embeddings using word-pair side information , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[144] Daniel P. W. Ellis,et al. Pruning subsequence search with attention-based embedding , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[145] Colin Raffel. Accelerating Multimodal Sequence Retrieval with Convolutional Networks , 2016 .

[146] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[147] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[148] Jiri Matas,et al. All you need is a good init , 2015, ICLR.

[149] Daniel P. W. Ellis,et al. Extracting Ground-Truth Information from MIDI Files: A MIDIfesto , 2016, ISMIR.

[150] Francesco Visin,et al. A guide to convolution arithmetic for deep learning , 2016, ArXiv.

[151] Charu C. Aggarwal,et al. Neural Networks and Deep Learning , 2018, Springer International Publishing.