Improved Decision Tree Method for Imbalanced Data Sets in Digital Forensics

Improved decision tree ID3 algorithm for suiting digital forensics is presented in the study. Forensics data are imbalanced, inconstant, noisy and dispersive. Based on these characteristic, we improve ID3 algorithm by adopting correction factor and two times information gain, which can avoid the large data bias of ID3 algorithm. The experimental results show that the improved algorithm has good simplicity and low error rate compared with ID3. It can be seen that the improved method used in the digital forensics process is entirely feasible.