Document classification method based on similarity

One kind of document similarity based classification, belonging to the field of computer storage systems, solve the conventional classification method occupies a large amount of computation and memory problems. The present invention comprises a blocking step, the checksum is calculated step, statistical classification step and step. The present invention does not require the processing of random access file data, only one process is performed from start to finish, to complete block, the checksum is calculated, statistical, sorting and classification of all steps to finalize; can efficiently acquire files between association information, similar to the level in the binary data file classified as a class, to uniquely identify a given file belongs to the category, and when it is determined whether two documents are similar only need to determine whether they belong to the same category can be identified , processing speed, less memory, can be determined by the accuracy of adjusting the operating parameters; applicable to all types of application data needs to obtain a similarity, in particular for the storage of data related to the weight applied.