论文信息 - Survey on Deep Neural Networks in Speech and Vision Systems

Survey on Deep Neural Networks in Speech and Vision Systems

This survey presents a review of state-of-the-art deep neural network architectures, algorithms, and systems in vision and speech applications. Recent advances in deep artificial neural network algorithms and architectures have spurred rapid innovation and development of intelligent vision and speech systems. With availability of vast amounts of sensor data and cloud computing for processing and training of deep neural networks, and with increased sophistication in mobile and embedded technology, the next-generation intelligent systems are poised to revolutionize personal and commercial computing. This survey begins by providing background and evolution of some of the most successful deep learning models for intelligent vision and speech systems to date. An overview of large-scale industrial research and development efforts is provided to emphasize future trends and prospects of intelligent vision and speech systems. Robust and efficient intelligent systems demand low-latency and high fidelity in resource-constrained hardware platforms such as mobile devices, robots, and automobiles. Therefore, this survey also provides a summary of key challenges and recent successes in running deep neural networks on hardware-restricted platforms, i.e. within limited memory, battery life, and processing capabilities. Finally, emerging applications of vision and speech across disciplines such as affective computing, intelligent transportation, and precision medicine are discussed. To our knowledge, this paper provides one of the most comprehensive surveys on the latest developments in intelligent vision and speech applications from the perspectives of both software and hardware systems. Many of these emerging technologies using deep neural networks show tremendous promise to revolutionize research and development for future vision and speech systems.

[1] Pierluigi Carcagnì,et al. Computational Analysis of Deep Visual Data for Quantifying Facial Expression Production , 2019, Applied Sciences.

[2] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[3] Chenchen Huang,et al. A Research of Speech Emotion Recognition Based on Deep Belief Network and SVM , 2014 .

[4] Niloy Ganguly,et al. Map Enhanced Route Travel Time Prediction using Deep Neural Networks , 2019, ArXiv.

[5] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[6] Susan E. Duncan,et al. Characterizing consumer emotional response to sweeteners using an emotion terminology questionnaire and facial expression analysis , 2015 .

[7] Jan van Leeuwen,et al. On the Construction of Huffman Trees , 1976, ICALP.

[8] T. Hermanns,et al. Automated Gleason grading of prostate cancer tissue microarrays via deep learning , 2018, Scientific Reports.

[9] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.

[10] Erhardt Barth,et al. A Hybrid Convolutional Variational Autoencoder for Text Generation , 2017, EMNLP.

[11] Honglak Lee,et al. Deep learning for robust feature generation in audiovisual emotion recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.

[13] Misha Denil,et al. Learning Where to Attend with Deep Architectures for Image Tracking , 2011, Neural Computation.

[14] Nicholas D. Lane,et al. An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices , 2015, IoT-App@SenSys.

[15] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[16] Nicholas D. Lane,et al. Can Deep Learning Revolutionize Mobile Sensing? , 2015, HotMobile.

[17] Joseph B. Leader,et al. A deep neural network predicts survival after heart imaging better than cardiologists , 2018, ArXiv.

[18] Antonio Bonafonte,et al. SEGAN: Speech Enhancement Generative Adversarial Network , 2017, INTERSPEECH.

[19] Vassilis Tsiaras,et al. Speech Enhancement for Noise-Robust Speech Synthesis Using Wasserstein GAN , 2019, INTERSPEECH.

[20] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Fei-Yue Wang,et al. Capturing Car-Following Behaviors by Deep Learning , 2018, IEEE Transactions on Intelligent Transportation Systems.

[22] Javier R. Movellan,et al. The Faces of Engagement: Automatic Recognition of Student Engagementfrom Facial Expressions , 2014, IEEE Transactions on Affective Computing.

[23] Rafael A. Calvo,et al. Affect Detection: An Interdisciplinary Review of Models, Methods, and Their Applications , 2010, IEEE Transactions on Affective Computing.

[24] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26] J. Leader,et al. A deep neural network to enhance prediction of 1-year mortality using echocardiographic videos of the heart , 2019 .

[27] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28] S Ullman,et al. Shifts in selective visual attention: towards the underlying neural circuitry. , 1985, Human neurobiology.

[29] Zhizheng Wu,et al. Automatic Speaker Verification Spoofing and Countermeasures Challenge (ASVspoof 2015) Database , 2014 .

[30] Anton van den Hengel,et al. Wider or Deeper: Revisiting the ResNet Model for Visual Recognition , 2016, Pattern Recognit..

[31] Manar D. Samad,et al. A Feasibility Study of Autism Behavioral Markers in Spontaneous Facial, Visual, and Hand Movement Response Data , 2018, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[32] Hai Jin,et al. Android Unikernel: Gearing mobile code offloading towards edge computing , 2018, Future Gener. Comput. Syst..

[33] Geoffrey E. Hinton,et al. Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[34] Qi Tian,et al. The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking , 2018, ECCV.

[35] Weisong Shi,et al. Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[36] Marc'Aurelio Ranzato,et al. On Learning Where To Look , 2014, ArXiv.

[37] Tara N. Sainath,et al. Structured Transforms for Small-Footprint Deep Learning , 2015, NIPS.

[38] Jae‐Hong Lee,et al. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. , 2018, Journal of dentistry.

[39] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[40] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41] M. Makary,et al. Medical error—the third leading cause of death in the US , 2016, British Medical Journal.

[42] Dit-Yan Yeung,et al. Towards Bayesian Deep Learning: A Survey , 2016, ArXiv.

[43] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[44] Cyril Allauzen,et al. Unary Data Structures for Language Models , 2011, INTERSPEECH.

[45] Li Xiu,et al. Application of data mining techniques in customer relationship management: A literature review and classification , 2009, Expert Syst. Appl..

[46] Brandon K. Fornwalt,et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration , 2018, npj Digital Medicine.

[47] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[48] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[49] Qiang Wang,et al. Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Marios Anthimopoulos,et al. Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network , 2016, IEEE Transactions on Medical Imaging.

[51] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Carla Lopes,et al. Phone Recognition on the TIMIT Database , 2012 .

[53] Anupam Agrawal,et al. Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[54] Harris Drucker,et al. Learning algorithms for classification: A comparison on handwritten digit recognition , 1995 .

[55] Jürgen Schmidhuber,et al. LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[56] Bin Sheng,et al. Deep Convolutional Neural Networks for Human Action Recognition Using Depth Maps and Postures , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[57] Sinan Kalkan,et al. Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision? , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.

[59] Mark D. McDonnell,et al. Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[60] Qiang Ji,et al. Video Affective Content Analysis: A Survey of State-of-the-Art Methods , 2015, IEEE Transactions on Affective Computing.

[61] Gang Wang,et al. Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[63] Qinghua Hu,et al. Vision Meets Drones: A Challenge , 2018, ArXiv.

[64] Xiaogang Wang,et al. Single-Pedestrian Detection Aided by Multi-pedestrian Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[65] Saurabh Sahu,et al. Adversarial Auto-Encoders for Speech Based Emotion Recognition , 2017, INTERSPEECH.

[66] Mohan M. Trivedi,et al. Video-based lane estimation and tracking for driver assistance: survey, system, and evaluation , 2006, IEEE Transactions on Intelligent Transportation Systems.

[67] Gang Sun,et al. Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[68] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[69] Tara N. Sainath,et al. Query-by-example keyword spotting using long short-term memory networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[70] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.

[71] Andreas Stolcke,et al. The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[72] Gang Wang,et al. Progressive Attention Guided Recurrent Network for Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[73] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[74] Yung-Hsiang Lu,et al. Cloud Computing for Mobile Users: Can Offloading Computation Save Energy? , 2010, Computer.

[75] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[76] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[77] Huchuan Lu,et al. Detect Globally, Refine Locally: A Novel Approach to Saliency Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[78] Jeff Donahue,et al. Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[79] Honglak Lee,et al. Attribute2Image: Conditional Image Generation from Visual Attributes , 2015, ECCV.

[80] Ali Razavi,et al. Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[81] Ling Shao,et al. Enhanced Computer Vision With Microsoft Kinect Sensor: A Review , 2013, IEEE Transactions on Cybernetics.

[82] Xiaogang Wang,et al. 3D Human Pose Estimation in the Wild by Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[83] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[84] Danyang Li,et al. Random Deep Belief Networks for Recognizing Emotions from Speech Signals , 2017, Comput. Intell. Neurosci..

[85] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[86] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.

[87] Ruzena Bajcsy,et al. An End-to-End Computer Vision Pipeline for Automated Cardiac Function Assessment by Echocardiography , 2017, ArXiv.

[88] Nicholas D. Lane,et al. DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[89] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[90] Ruslan Salakhutdinov,et al. Generating Images from Captions with Attention , 2015, ICLR.

[91] Ehsan Rahimy,et al. Deep learning applications in ophthalmology , 2018, Current opinion in ophthalmology.

[92] Florian Metze,et al. Extracting deep bottleneck features using stacked auto-encoders , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[93] Basabi Chakraborty,et al. A review on application of data mining techniques to combat natural disasters , 2016, Ain Shams Engineering Journal.

[94] Matti Pietikäinen,et al. Towards Reading Hidden Emotions: A Comparative Study of Spontaneous Micro-Expression Spotting and Recognition Methods , 2015, IEEE Transactions on Affective Computing.

[95] J. Pauly,et al. Deep learning enables reduced gadolinium dose for contrast‐enhanced brain MRI , 2018, Journal of magnetic resonance imaging : JMRI.

[96] Sanyuan Zhao,et al. Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection , 2018, ECCV.

[97] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[98] Mamun Bin Ibne Reaz,et al. A Review of Smart Homes—Past, Present, and Future , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[99] David B. Thomas,et al. Redundancy-Reduced MobileNet Acceleration on Reconfigurable Logic for ImageNet Classification , 2018, ARC.

[100] Geoffrey E. Hinton. Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[101] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[102] Alexander Gruenstein,et al. Accurate and compact large vocabulary speech recognition on mobile devices , 2013, INTERSPEECH.

[103] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[104] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[105] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[106] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[107] Jitendra Malik,et al. Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[108] Sidong Liu,et al. Multimodal Neuroimaging Feature Learning for Multiclass Diagnosis of Alzheimer's Disease , 2015, IEEE Transactions on Biomedical Engineering.

[109] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[110] Katherine B. Martin,et al. Facial Action Coding System , 2015 .

[111] F. Volkmar,et al. Speech and prosody characteristics of adolescents and adults with high-functioning autism and Asperger syndrome. , 2001, Journal of speech, language, and hearing research : JSLHR.

[112] Yutaka Matsuo,et al. Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder , 2018, INTERSPEECH.

[113] Bo Xu,et al. Investigation of deep Boltzmann machines for phone recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[114] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[115] Sebastian Thrun,et al. Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[116] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[117] Themos Stafylakis,et al. Zero-shot keyword spotting for visual speech recognition in-the-wild , 2018, ECCV.

[118] Shuohang Wang,et al. Learning Natural Language Inference with LSTM , 2015, NAACL.

[119] Claudia Clopath,et al. Deep Reinforcement Learning for Subpixel Neural Tracking , 2018, MIDL.

[120] Konrad Schindler,et al. Learning by Tracking: Siamese CNN for Robust Target Association , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[121] Shweta Srivastava,et al. Safety and security in smart cities using artificial intelligence — A review , 2017, 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence.

[122] Ting Rui,et al. Convolutional neural network simplification via feature map pruning , 2018, Comput. Electr. Eng..

[123] Francoise Beaufays,et al. “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .

[124] Alain Rakotomamonjy,et al. Histogram of gradients of Time-Frequency Representations for Audio scene detection , 2015 .

[125] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[126] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[127] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[128] Fei Sha,et al. Aligning Where to See and What to Tell: Image Captioning with Region-Based Attention and Scene-Specific Contexts , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[129] Trevor Darrell,et al. Adversarial Feature Learning , 2016, ICLR.

[130] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[131] Evgeny Putin,et al. Deep biomarkers of human aging: Application of deep neural networks to biomarker development , 2016, Aging.

[132] Yansong Tang,et al. Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[133] Daniel Thalmann,et al. Real-Time 3D Hand Pose Estimation with 3D Convolutional Neural Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[134] Tara N. Sainath,et al. Convolutional neural networks for small-footprint keyword spotting , 2015, INTERSPEECH.

[135] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[136] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[137] Lukás Burget,et al. Strategies for training large scale neural network language models , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[138] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.

[139] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[140] Walid Saad,et al. Deep Learning for Reliable Mobile Edge Analytics in Intelligent Transportation Systems: An Overview , 2017, IEEE Vehicular Technology Magazine.

[141] Xin Yang,et al. Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved? , 2018, IEEE Transactions on Medical Imaging.

[142] Tarik Taleb,et al. On Multi-Access Edge Computing: A Survey of the Emerging 5G Network Edge Cloud Architecture and Orchestration , 2017, IEEE Communications Surveys & Tutorials.

[143] Eduard H. Hovy,et al. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[144] Alain Rakotomamonjy,et al. Histogram of gradients of Time-Frequency Representations for Audio scene detection , 2015, ArXiv.

[145] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[146] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.

[147] Byung-Gon Chun,et al. CloneCloud: elastic execution between mobile device and cloud , 2011, EuroSys '11.

[148] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[149] Yongqiang Wang,et al. Small-footprint high-performance deep neural network-based speech recognition using split-VQ , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[150] Keiichi Uchimura,et al. Driver inattention monitoring system for intelligent vehicles: A review , 2009 .

[151] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[152] V. Pan. Structured Matrices and Polynomials: Unified Superfast Algorithms , 2001 .

[153] Xiaogang Wang,et al. Single-Pedestrian Detection Aided by Two-Pedestrian Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[154] Louahdi Khoudour,et al. Exploiting deep residual networks for human action recognition from skeletal data , 2018, Comput. Vis. Image Underst..

[155] Khan M. Iftekharuddin,et al. A pilot study to identify autism related traits in spontaneous facial actions using computer vision , 2019, Research in Autism Spectrum Disorders.

[156] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[157] Rama Chellappa,et al. Face-based Active Authentication on mobile devices , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[158] Slawomir Wesolkowski,et al. A Review of the Use of Computational Intelligence in the Design of Military Surveillance Networks , 2016, Recent Advances in Computational Intelligence in Defense and Security.

[159] Georg Heigold,et al. Small-footprint keyword spotting using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[160] Wenhao Huang,et al. Deep Architecture for Traffic Flow Prediction: Deep Belief Networks With Multitask Learning , 2014, IEEE Transactions on Intelligent Transportation Systems.

[161] Atsushi Yamashita,et al. Lane-Change Detection Based on Vehicle-Trajectory Prediction , 2017, IEEE Robotics and Automation Letters.

[162] Anil A. Bharath,et al. A data augmentation methodology for training machine/deep learning gait recognition algorithms , 2016, BMVC.

[163] Zheng Wang,et al. A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos , 2018, Neurocomputing.

[164] Sankaran Panchapagesan,et al. Model Compression Applied to Small-Footprint Keyword Spotting , 2016, INTERSPEECH.

[165] Jon Barker,et al. The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines , 2018, INTERSPEECH.

[166] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.

[167] Xing Ji,et al. CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[168] Sergey Levine,et al. VideoFlow: A Flow-Based Generative Model for Video , 2019, ArXiv.

[169] Honglak Lee,et al. Learning hierarchical representations for face verification with convolutional deep belief networks , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[170] Jeffrey F. Cohn,et al. Detecting Depression Severity from Vocal Prosody , 2013, IEEE Transactions on Affective Computing.

[171] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[172] Alex Graves,et al. DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[173] Daniel Thalmann,et al. Real-time 3 D Hand Pose Estimation with 3 D Convolutional Neural Networks , 2018 .

[174] Geoffrey E. Hinton,et al. Deep Belief Networks for phone recognition , 2009 .

[175] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .

[176] Geoffrey E. Hinton,et al. Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[177] Li Fei-Fei,et al. Progressive Neural Architecture Search , 2017, ECCV.

[178] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[179] Yaroslav Bulatov,et al. Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks , 2013, ICLR.

[180] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[181] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[182] Seyedmahdad Mirsamadi,et al. Automatic speech emotion recognition using recurrent neural networks with local attention , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[183] Peng Wang,et al. Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[184] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[185] Sergio A. Velastin,et al. A Review of Computer Vision Techniques for the Analysis of Urban Traffic , 2011, IEEE Transactions on Intelligent Transportation Systems.

[186] Francisco Herrera,et al. Automatic handgun detection alarm in videos using deep learning , 2017, Neurocomputing.

[187] Raymond Y. K. Lau,et al. Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[188] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[189] Sébastien Marcel,et al. MOBIO Database for the ICPR 2010 Face and Speech Competition , 2009 .

[190] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[191] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[192] Rama Chellappa,et al. Deep feature-based face detection on mobile devices , 2016, 2016 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA).

[193] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[194] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[195] Margaret Lech,et al. Evaluating deep learning architectures for Speech Emotion Recognition , 2017, Neural Networks.

[196] Yoshua Bengio,et al. NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[197] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[198] Xin Zhang,et al. End to End Learning for Self-Driving Cars , 2016, ArXiv.

[199] Christopher Joseph Pal,et al. Brain tumor segmentation with Deep Neural Networks , 2015, Medical Image Anal..

[200] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[201] Enrique Marcelo Albornoz,et al. Spoken Emotion Recognition Using Deep Learning , 2014, CIARP.

[202] Hermann Ney,et al. Improved training of end-to-end attention models for speech recognition , 2018, INTERSPEECH.

[203] Iasonas Kokkinos,et al. DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[204] Zachary Chase Lipton. A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[205] Vaneet Aggarwal,et al. DeepPool: Distributed Model-Free Algorithm for Ride-Sharing Using Deep Reinforcement Learning , 2019, IEEE Transactions on Intelligent Transportation Systems.

[206] Antonio Torralba,et al. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence , 2016, Scientific Reports.

[207] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[208] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[209] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[210] Jason Weston,et al. Memory Networks , 2014, ICLR.

[211] Fakhri Karray,et al. Survey on speech emotion recognition: Features, classification schemes, and databases , 2011, Pattern Recognit..

[212] Martial Hebert,et al. An Uncertain Future: Forecasting from Static Images Using Variational Autoencoders , 2016, ECCV.

[213] Johan Schalkwyk,et al. On-demand language model interpolation for mobile speech input , 2010, INTERSPEECH.

[214] Minh N. Do,et al. Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[215] Geoffrey Zweig,et al. Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.

[216] Colin Camerer. Artificial Intelligence and Behavioral Economics , 2018, The Economics of Artificial Intelligence.

[217] Chao Wang,et al. Data-Driven Multi-step Demand Prediction for Ride-hailing Services Using Convolutional Neural Network , 2019, Advances in Intelligent Systems and Computing.

[218] Razvan Pascanu,et al. Deep Learners Benefit More from Out-of-Distribution Examples , 2011, AISTATS.

[219] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[220] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[221] Oleg Yunakov. Personal Virtual Assistant , 2005 .

[222] Léon Bottou,et al. Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[223] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[224] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[225] Mohan M. Trivedi,et al. Looking at Humans in the Age of Self-Driving and Highly Automated Vehicles , 2016, IEEE Transactions on Intelligent Vehicles.

[226] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[227] Jing Yang,et al. 3-D Convolutional Recurrent Neural Networks With Attention Model for Speech Emotion Recognition , 2018, IEEE Signal Processing Letters.

[228] Ian J. Goodfellow,et al. NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.

[229] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[230] Daniel Rueckert,et al. Deep learning cardiac motion analysis for human survival prediction , 2018, Nature Machine Intelligence.

[231] Peng Hao,et al. Transfer learning using computational intelligence: A survey , 2015, Knowl. Based Syst..

[232] Gwen Littlewort,et al. Recognizing facial expression: machine learning and application to spontaneous behavior , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[233] Jen-Tzung Chien,et al. Deep long short-term memory networks for speech recognition , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[234] Khan M. Iftekharuddin,et al. Novel hierarchical Cellular Simultaneous Recurrent neural Network for object detection , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[235] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.