Clustered File Type Identification

This thesis examines the possibility of expanding the current field of research for file type identification in Digital Forensics. A proposed solution is presented where unsupervised clustering and supervised classification are combined. The experimentation of the proposed solution increases the speed of file type identification, however with a decrease of total identification accuracy. A technique of unsupervised continuous learning is also presented, effectively making the proposed solution capable of adapting to the environment by learning from the test data while performing file type identification. In the best case scenario, identification accuracy increases from 85.8% to 90.4% when using the unsupervised continuous learning technique.