论文信息 - Optimising Storage Resource using Morpheme based Text Compression Technique

Optimising Storage Resource using Morpheme based Text Compression Technique

paper, we present a text compression technique which utilises morpheme-based text compression to optimise storage resources. The proposed technique is designed to decompose words into their morphemes and then to produce code representations for compression. The proposed algorithm is implemented using English Language text data and applied using 30 different texts of different lengths collected from different sources with different natures. The efficiency increases with the increase in the number of long, repetitive morphemes in the input data. To the best of our knowledge, the resulting implementation is the first to demonstrate lossless compression using such a technique. We illustrate its suitability and effectiveness on a number of benchmark file sizes - small, middle-sized, large, and very large real-world application. The results indicated a good compression performance of 98% making the approach an attractive one. A further virtue of this method is its dynamic application. A degraded compression can be compensated for by appending identified morphemes within the document to the dictionary to improve compression. The evaluation experiments show that: if storage space is the primary consideration, the morpheme- based text compression technique is an efficient approach for compressing text data.

Joseph Kobina Panford | J. B. Hayfron-Acquah | J. K. Panford | Rockson Kwasi Afriyie | Rockson KwasiAfriyie

[1] Julie S. Amberg,et al. Introduction: What is language? , 2009 .

[2] Ian H. Witten,et al. Text Compression , 1990, 125 Problems in Text Algorithms.

[3] David Graddol,et al. The Future of English? : A guide to forecasting the popularity of the English language in the 21st century , 1997 .

[4] Mark Nelson,et al. The Data Compression Book, 2nd Edition , 1996 .

[5] Daniel S. Hirschberg,et al. Data compression , 1987, CSUR.

[6] G.G. Langdon,et al. Data compression , 1988, IEEE Potentials.

[7] Mark Nelson,et al. The data compression book (2nd ed.) , 1995 .

[8] Helmut Krcmar,et al. Big Data , 2014, Wirtschaftsinf..

[9] Ida Pu. Fundamental Data Compression , 2005 .

[10] Sanjay Misra,et al. A lossless text compression technique using syllable based morphology , 2011, Int. Arab J. Inf. Technol..

[11] Robert D. Rodman,et al. An Introduction to Language , 1984 .

[12] Stephen Wolfram,et al. A New Kind of Science , 2003, Artificial Life.

[13] John H. Reif,et al. Algorithmic applications of data compression techniques , 1996 .