暂无分享,去创建一个
[1] Rohit Jain,et al. MyLipper: A Personalized System for Speech Reconstruction using Multi-view Visual Feeds , 2018, 2018 IEEE International Symposium on Multimedia (ISM).
[2] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[3] Shin'ichi Satoh,et al. Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed , 2018, ACM Multimedia.
[4] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[5] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Dan Alistarh,et al. Model compression via distillation and quantization , 2018, ICLR.
[7] Andrew Lavin,et al. Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Shmuel Peleg,et al. Improved Speech Reconstruction from Silent Video , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).
[9] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[10] Q. Summerfield,et al. Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.
[11] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.
[12] Stephen P. Morse,et al. The Intel 8086 Microprocessor: a 16-bit Evolution of the 8080 , 1978, Computer.
[13] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[14] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[15] Mohammed Bennamoun,et al. Listening with Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[16] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Satoshi Nakamura,et al. Compressing End-to-end ASR Networks by Tensor-Train Decomposition , 2018, INTERSPEECH.
[18] Jie Zhang,et al. Dynamically Hierarchy Revolution: DirNet for Compressing Recurrent Neural Network on Mobile Devices , 2018, IJCAI.
[19] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Kurt Keutzer,et al. Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[22] Tsuhan Chen,et al. Audio-visual integration in multimodal communication , 1998, Proc. IEEE.
[23] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[24] Vladlen Koltun,et al. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.
[25] Ben P. Milner,et al. Reconstructing intelligible audio speech from visual speech features , 2015, INTERSPEECH.
[26] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[27] Jeff Johnson,et al. Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.
[28] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Johan A. du Preez,et al. Audio-Visual Speech Recognition using SciPy , 2010 .
[30] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[31] Wonyong Sung,et al. Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices , 2018, NeurIPS.
[32] Maja Pantic,et al. End-to-End Multi-View Lipreading , 2017, BMVC.
[33] Jian Cheng,et al. Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Igor Carron,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .
[35] Jinjun Xiong,et al. Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[36] Jürgen Schmidhuber,et al. Lipreading with long short-term memory , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.
[39] Yixin Chen,et al. Compressing Neural Networks with the Hashing Trick , 2015, ICML.
[40] David Taylor. Hearing by Eye: The Psychology of Lip-Reading , 1988 .
[41] Alexander I. Rudnicky,et al. Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[42] Dacheng Tao,et al. On Compressing Deep Models by Low Rank and Sparse Decomposition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Liqiang Zhang,et al. 3D Depthwise Convolution: Reducing Model Parameters in 3D Vision Tasks , 2018, Canadian Conference on AI.
[44] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[45] Richard Socher,et al. Quasi-Recurrent Neural Networks , 2016, ICLR.
[46] Stéphane Mallat,et al. Rigid-Motion Scattering for Texture Classification , 2014, ArXiv.
[47] Ian McGraw,et al. On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[48] Patrice Y. Simard,et al. High Performance Convolutional Neural Networks for Document Processing , 2006 .
[49] Samuel Pachoud,et al. Macro-cuboïd based probabilistic matching for lip-reading digits , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[50] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[51] Mark Horowitz,et al. 1.1 Computing's energy problem (and what we can do about it) , 2014, 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC).
[52] Gaël Varoquaux,et al. The NumPy Array: A Structure for Efficient Numerical Computation , 2011, Computing in Science & Engineering.
[53] Ivan V. Oseledets,et al. Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.
[54] Maja Pantic,et al. Deep complementary bottleneck features for visual speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[55] Yann LeCun,et al. Fast Training of Convolutional Networks through FFTs , 2013, ICLR.