Fault-Tolerant Decompression Method of Compressed Chinese Text Files

Once lossless compressed data is damaged in the transmission process, a specific fault-tolerant decompression algorithm is required to correct the error. A novel fault-tolerant decompression method for English text files has been proposed for error detection and correction in previous research work, which is trained with natural language model of English. In this paper, we transfer action scope of this fault-tolerant decompression method from English compressed files to Chinese text files. In order to apply the algorithm framework to Chinese compressed files, a N-Gram-based language model is built adhering to the compression coding rules and the grammar rules of Chinese, with which the prior information of source can be fully expressed, and the character encoding adaptation problem in the process of algorithm transfer-learning is solved. The experiment results demonstrate that the proposed algorithm can meet the requirements for fault-tolerant decompression of lossless compressed Chinese text files.

[1]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[2]  Sangjin Lee,et al.  Recovery of Damaged Compressed Files for Digital Forensic Purposes , 2008, 2008 International Conference on Multimedia and Ubiquitous Engineering (mue 2008).

[3]  Li Yujian,et al.  A Normalized Levenshtein Distance Metric , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Qingquan Sun,et al.  Novel Fault-Tolerant Decompression Method of Corrupted Huffman Files , 2018, Wirel. Pers. Commun..

[5]  Qingquan Sun,et al.  Novel Fault-Tolerant Decompression Method of Corrupted LZSS Files , 2018, Wireless Personal Communications.

[6]  Sanghoon Lee,et al.  Novel Error Detection Algorithm for LZSS Compressed Data , 2017, IEEE Access.