A System Integrating Speech Interaction and Vision Sensing Applying in Smart Home Scenario
暂无分享,去创建一个
Zheng Tao | Peilin Liu | Jiuchao Qian | Junhong Chen | Zheng Gong | Junyu Dai | Xiaoguang Zhu | Huaqing Shao
[1] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.
[2] Reinhold Häb-Umbach,et al. Blind Acoustic Beamforming Based on Generalized Eigenvalue Decomposition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[3] Xiaogang Wang,et al. Joint Detection and Identification Feature Learning for Person Search , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[5] Reinhold Häb-Umbach,et al. Neural network based spectral mask estimation for acoustic beamforming , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[7] Zhizheng Wu,et al. Merlin: An Open Source Neural Network Speech Synthesis System , 2016, SSW.
[8] Takuya Yoshioka,et al. Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[10] Ming-Kuei Hu,et al. Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.
[11] Samy Bengio,et al. Learning semantic relationships for better action retrieval in images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Jon Barker,et al. The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[13] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[14] Peilin Liu,et al. Robust Beamforming for Speech Recognition Using DNN-Based Time-Frequency Masks Estimation , 2018, IEEE Access.
[15] Xiaogang Wang,et al. End-to-End Deep Learning for Person Search , 2016, ArXiv.
[16] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.
[17] Xilin Chen,et al. Visual Relationship Detection With Deep Structural Ranking , 2018, AAAI.