A new approach to content-based file type detection

File type identification and file type clustering may be difficult tasks that have an increasingly importance in the field of computer and network security. Classical methods of file type detection including considering file extensions and magic bytes can be easily spoofed. Content-based file type detection is a newer way that is taken into account recently. In this paper, a new content-based method for the purpose of file type detection and file type clustering is proposed that is based on the PCA and neural networks. The proposed method has a good accuracy and is fast enough.

[1]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[2]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[3]  Gregory B. White,et al.  An Approach to Detect Executable Content for Anomaly Based Network Intrusion Detection , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[4]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[5]  James George Dunham,et al.  Classifying file type of stream ciphers in depth using neural networks , 2005, The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005..

[6]  Ke Wang,et al.  Fileprints: identifying file types by n-gram analysis , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[7]  N. Shahmehri,et al.  File Type Identification of Data Fragments by Their Binary Structure , 2006, 2006 IEEE Information Assurance Workshop.

[8]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[9]  Mohammad Hossain Heydari,et al.  Content based file type detection algorithms , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[10]  T. Achalakul,et al.  File Type Classification for Adaptive Object File System , 2006, TENCON 2006 - 2006 IEEE Region 10 Conference.