Attention-driven Multi-sensor Selection
暂无分享,去创建一个
Stefan Braun | Shih-Chii Liu | Jithendar Anumula | Daniel Neil | Enea Ceolini | Shih-Chii Liu | Daniel Neil | Stefan Braun | Jithendar Anumula | Enea Ceolini
[1] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[2] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[3] Hiroaki Kitano,et al. Active audition system and humanoid exterior design , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).
[4] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Stefanos Zafeiriou,et al. 300 Faces In-The-Wild Challenge: database and results , 2016, Image Vis. Comput..
[6] Ian Lane,et al. End-to-End Speech Recognition with Auditory Attention for Multi-Microphone Distance Speech Recognition , 2017, INTERSPEECH.
[7] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[8] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[9] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[10] Christian Wolf,et al. Modout: Learning to Fuse Modalities via Stochastic Regularization , 2016 .
[11] Hiroshi G. Okuno,et al. Robot audition: Its rise and perspectives , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] B. V. K. Vijaya Kumar,et al. A multi-sensor fusion system for moving object detection and tracking in urban driving environments , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).
[13] John R. Hershey,et al. Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming , 2017, IEEE Journal of Selected Topics in Signal Processing.
[14] Katsutoshi Itoyama,et al. Posture estimation of hose-shaped robot using microphone array localization , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[15] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[16] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[19] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[20] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[21] James M. Rehg,et al. Aggressive Deep Driving: Combining Convolutional Neural Networks and Model Predictive Control , 2017, CoRL.
[22] Xavier Anguera Miró,et al. Acoustic Beamforming for Speaker Diarization of Meetings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[24] Anca D. Dragan,et al. Learning Robot Objectives from Physical Human Interaction , 2017, CoRL.
[25] John R. Hershey,et al. Attention-Based Multimodal Fusion for Video Description , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] Shimon Whiteson,et al. LipNet: Sentence-level Lipreading , 2016, ArXiv.
[27] Jürgen Schmidhuber,et al. A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots , 2016, IEEE Robotics and Automation Letters.
[28] Jon Barker,et al. An analysis of environment, microphone and data simulation mismatches in robust speech recognition , 2017, Comput. Speech Lang..
[29] Yoshua Bengio,et al. Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.
[30] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[31] Steven Bohez,et al. Sensor fusion for robot control through deep reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[32] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.
[33] Manuela M. Veloso,et al. Learning End-to-end Multimodal Sensor Policies for Autonomous Navigation , 2017, CoRL.
[34] Christian Wolf,et al. ModDrop: Adaptive Multi-Modal Gesture Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..