Automatic Singing Transcription Based on Encoder-decoder Recurrent Neural Networks with a Weakly-supervised Attention Mechanism
暂无分享,去创建一个
Masataka Goto | Ryo Nishikimi | Satoru Fukayama | Eita Nakamura | Kazuyoshi Yoshii | Masataka Goto | Eita Nakamura | Kazuyoshi Yoshii | Satoru Fukayama | Ryo Nishikimi
[1] Andrew W. Senior,et al. Fast and accurate recurrent neural network acoustic models for speech recognition , 2015, INTERSPEECH.
[2] Jordan B. L. Smith,et al. Probabilistic transcription of sung melody using a pitch dynamic model , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Masataka Goto,et al. Scale- and Rhythm-Aware Musical Note Estimation for Vocal F0 Trajectories Based on a Semi-Tatum-Synchronous Hierarchical Hidden Semi-Markov Model , 2017, ISMIR.
[4] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[5] Masataka Goto,et al. RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.
[6] Katsutoshi Itoyama,et al. Singing voice analysis and editing based on mutually dependent F0 estimation and source separation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] D. J. Hermes,et al. Measurement of pitch by subharmonic summation. , 1988, The Journal of the Acoustical Society of America.
[8] Gaël Richard,et al. Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[9] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[10] Hiromasa Fujihara,et al. Songle: A Web Service for Active Music Listening Improved by User Contributions , 2011, ISMIR.
[11] DeLiang Wang,et al. Separation of singing voice from music accompaniment for monaural recordings , 2007 .
[12] Simon Dixon,et al. PYIN: A fundamental frequency estimator using probabilistic threshold distributions , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Masataka Goto,et al. A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..
[14] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[15] Masataka Goto,et al. A Learning-Based Quantization: Unsupervised Estimation of the Model Parameters , 2003, ICMC.
[16] Tara N. Sainath,et al. An Analysis of "Attention" in Sequence-to-Sequence Models , 2017, INTERSPEECH.
[17] Ryo Nishikimi,et al. Probabilistic Sequential Patterns for Singing Transcription , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[18] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[20] Tomoshi Otsuki,et al. Hidden Markov model for automatic transcription of MIDI signals , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..
[21] Emilia Gómez,et al. Automatic Transcription of Flamenco Singing From Polyphonic Music Recordings , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[22] Masataka Goto,et al. AIST Annotation for the RWC Music Database , 2006, ISMIR.
[23] Jordi Bonada,et al. A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs , 2017 .
[24] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[25] Mark Steedman,et al. Multi-Pitch Detection and Voice Assignment for A Cappella Recordings of Multiple Singers , 2017, ISMIR.
[26] Emilio Molina,et al. SiPTH: Singing Transcription Based on Hysteresis Defined on the Pitch-Time Curve , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[27] Christopher Raphael,et al. A hybrid graphical model for rhythmic parsing , 2002, Artif. Intell..
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] Paris Smaragdis,et al. Singing-voice separation from monaural recordings using robust principal component analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Emilia Gómez,et al. Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics , 2012, IEEE Transactions on Audio, Speech, and Language Processing.