论文信息 - Classification of run-length encoded binary data

Classification of run-length encoded binary data

In classification of binary featured data, distance computation is carried out by considering each feature. We represent the given binary data as run-length encoded data. This would lead to a compact or compressed representation of data. Further, we propose an algorithm to directly compute the Manhattan distance between two such binary encoded patterns. We show that classification of data in such compressed form would improve the computation time by a factor of 5 on large handwritten data. The scheme is useful in large data clustering and classification which depend on distance measures.

T. Ravindra Babu | M. Narasimha Murty | V. K. Agrawal

[1] Benjamin C. M. Fung,et al. Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[2] Chun Zhang,et al. Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[3] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[4] Gonzalo Navarro,et al. Approximate Matching of Run-Length Compressed Strings , 2001, CPM.

[5] Gonzalo Navarro,et al. Approximate Matching of Run-Length Compressed Strings , 2002, Algorithmica.