Augmenting Recurrent Neural Networks Resilience by Dropout

This brief discusses the simple idea that dropout regularization can be used to efficiently induce resiliency to missing inputs at prediction time in a generic neural network. We show how the approach can be effective on tasks where imputation strategies often fail, namely, involving recurrent neural networks and scenarios where whole sequences of input observations are missing. The experimental analysis provides an assessment of the accuracy–resiliency tradeoff in multiple recurrent models, including reservoir computing methods, and comprising real-world ambient intelligence and biomedical time series.

[1]  Zhi-Hua Zhou,et al.  Dropout Rademacher complexity of deep neural networks , 2014, Science China Information Sciences.

[2]  John Francis Kros,et al.  Data mining and the impact of missing data , 2003, Ind. Manag. Data Syst..

[3]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[4]  Davide Bacciu Unsupervised feature selection for sensor time-series in pervasive computing applications , 2015, Neural Computing and Applications.

[5]  Rachid Guerraoui,et al.  When Neurons Fail , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[6]  Terrence J. Sejnowski,et al.  Nonlinear Time-Series Prediction with Missing and Noisy Data , 2001 .

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Robi Polikar,et al.  An ensemble of classifiers approach for the missing feature problem , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[9]  Lynne E. Parker,et al.  A spatial-temporal imputation technique for classification with missing data in a wireless sensor network , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Yoshua Bengio,et al.  Recurrent Neural Networks for Missing or Asynchronous Data , 1995, NIPS.

[11]  John A. Drakopoulos,et al.  Training neural networks with heterogeneous data , 2005, Neural Networks.

[12]  Vincenzo Piuri,et al.  Analysis of Fault Tolerance in Artificial Neural Networks , 2001, J. Parallel Distributed Comput..

[13]  Pascal Vincent,et al.  Dropout as data augmentation , 2015, ArXiv.

[14]  Peter K. Sharpe,et al.  Dealing with missing values in neural network-based diagnostic systems , 1995, Neural Computing & Applications.

[15]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[16]  Volker Tresp,et al.  Efficient Methods for Dealing with Missing Data in Supervised Learning , 1994, NIPS.

[17]  Mia K. Markey,et al.  Impact of missing data in training artificial neural networks for computer-aided diagnosis , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[18]  Kai Jiang,et al.  Classification for Incomplete Data Using Classifier Ensembles , 2005, 2005 International Conference on Neural Networks and Brain.

[19]  Philip Bachman,et al.  Learning with Pseudo-Ensembles , 2014, NIPS.

[20]  Yan Liu,et al.  Recurrent Neural Networks for Multivariate Time Series with Missing Values , 2016, Scientific Reports.

[21]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[22]  Phil D. Green,et al.  A neural network for classification with incomplete data: application to robust ASR , 2000, INTERSPEECH.

[23]  Vincenzo Piuri,et al.  Sensitivity to errors in artificial neural networks: a behavioral approach , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[24]  Alessandro Saffiotti,et al.  Learning context-aware mobile robot navigation in home environments , 2014, IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications.

[25]  Davide Bacciu,et al.  An experimental characterization of reservoir computing in ambient assisted living applications , 2013, Neural Computing and Applications.

[26]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[27]  Claudio Gallicchio,et al.  Deep Echo State Network (DeepESN): A Brief Survey , 2017, ArXiv.

[28]  Mac McKee,et al.  Effect of missing data on performance of learning algorithms for hydrologic predictions: Implications to an imputation technique , 2007 .

[29]  G. Moody,et al.  Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in cardiology challenge 2012 , 2012, 2012 Computing in Cardiology.

[30]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[31]  Davide Morelli,et al.  DropIn: Making reservoir computing neural networks robust to missing inputs by dropout , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[32]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[33]  Lynne E. Parker,et al.  Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks , 2014, Inf. Fusion.

[34]  Davide Bacciu,et al.  Linear Memory Networks , 2018, ICANN.