论文信息 - Research and design of similar file forensics system based on fuzzy hash

Research and design of similar file forensics system based on fuzzy hash

The collection and identification of digital evidence is an essential procedure in file forensics, which contains manual retrieval, traditional hash techniques and query by keywords techniques etc. For the vulnerability of electronic documents, it is easy to be changed or tampered with. So looking for files similar with target files becomes important for forensic. However, the traditional forensic system is usually based on searching for keywords or just scan the entire files, both lack of high enough speed and accuracy to support nowadays forensic tasks. Considering the fuzzy hash algorithm is of great value to calculating the similarity rate between files, in this paper, we analyzed the process and the improvement of the fuzzy hash algorithm, and verified the accuracy and efficiency of the improved algorithm, we innovatively applied fuzzy hash technology to the field of file forensic and designed a set of more adaptable and more accurate files forensic system, which follows the process of the acquisition of storage media, the collection of evident files, and the preservation of evident files, combined with text mining, data recovery technology, text clustering, classification, and some other technologies We believed that this system is a breakthrough of existing problems in file forensics field such like large manual workload and low accuracy.

Jiang Jianguo | Liu Chao | Yu Qian | Chen Jiuming | Liu Kunying

[1] Duminda Wijesekera,et al. A Highly Recoverable and Efficient Filesystem , 2014 .

[2] Jesse D. Kornblum. Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..

[3] Patrick Van Eecke,et al. Legal aspects of text mining , 2014, LREC.

[4] Khairullah Khan,et al. Mining opinion components from unstructured reviews: A review , 2014, J. King Saud Univ. Comput. Inf. Sci..

[5] Ricardo M. Marcacini,et al. Interactive textual feature selection for consensus clustering , 2015, Pattern Recognit. Lett..

[6] Nizar Bouguila,et al. A variational Bayes model for count data learning and classification , 2014, Eng. Appl. Artif. Intell..

[7] Di Xiao,et al. Improvement and performance analysis of a novel hash function based on chaotic neural network , 2011, Neural Computing and Applications.

[8] Carolin Huhn,et al. Electromigrative separation techniques in forensic science: combining selectivity, sensitivity, and robustness , 2014, Analytical and Bioanalytical Chemistry.

[9] Sangjin Lee,et al. Detecting Similar Files Based on Hash and Statistical Analysis for Digital Forensic Investigation , 2009, 2009 2nd International Conference on Computer Science and its Applications.

[10] Maoqiang Xie,et al. Rank hash similarity for fast similarity search , 2013, Inf. Process. Manag..