Representation-Oblivious Error Correction by Natural Redundancy

Storage systems have a strong need for substantially improving their error correction capabilities, especially for long-term storage where the accumulating errors can exceed the decoding threshold of error-correcting codes (ECCs). In this work, a new scheme is presented that uses deep learning to perform soft decoding for noisy files based on their natural redundancy. The soft decoding result is then combined with ECCs for substantially better error correction performance. The scheme is representation-oblivious: it requires no prior knowledge on how data are represented (e.g., mapped from symbols to bits, compressed, and combined with meta data) in different types of files, which makes the solution more convenient to use for storage systems. Experimental results confirm that the scheme can substantially improve the ability to recover data for different types of files even when the bit error rates in the files have significantly exceeded the decoding threshold of the ECC. The code of this work has been publicly released.

[1]  Qin Huang,et al.  Error control coding combined with content recognition , 2016, 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP).

[2]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[3]  Enhong Chen,et al.  Image Denoising and Inpainting with Deep Neural Networks , 2012, NIPS.

[4]  Mohsen Toorani,et al.  A new approach to content-based file type detection , 2008, 2008 IEEE Symposium on Computers and Communications.

[5]  Sreeram Kannan,et al.  Communication Algorithms via Deep Learning , 2018, ICLR.

[6]  Qin Huang,et al.  On Bit-Level Decoding of Nonbinary LDPC Codes , 2018, IEEE Transactions on Communications.

[7]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[8]  Tian Sheuan Chang,et al.  Data and Hardware Efficient Design for Convolutional Neural Network , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[9]  Chao Sun,et al.  Robustness of Neural Networks against Storage Media Errors , 2017, ArXiv.

[10]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ying Wang,et al.  Joint Source-Channel Decoding of Polar Codes for Language-Based Sources , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[12]  Colin Morris,et al.  Using NLP techniques for file fragment classification , 2012, Digit. Investig..

[13]  Yue Wang,et al.  Content-assisted file decoding for nonvolatile memories , 2012, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[14]  Jan Lansky,et al.  Syllable-Based Burrows-Wheeler Transform , 2007, DATESO.

[15]  David Burshtein,et al.  Deep Learning Methods for Improved Decoding of Linear Codes , 2017, IEEE Journal of Selected Topics in Signal Processing.

[16]  Drue Coles,et al.  Predicting the types of file fragments , 2008, Digit. Investig..

[17]  Anxiao Jiang,et al.  On LDPC decoding with natural redundancy , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  Kui Cai,et al.  Vertical constrained coding for phase-change memory with thermal crosstalk , 2014, 2014 International Conference on Computing, Networking and Communications (ICNC).