Statistical characterisation of MP3 encoders for steganalysis

This paper outlines a strategy to discriminate different ISO/MPEG 1 Audio Layer-3 (MP3) encoding programs by statistical particularities of the compressed audio streams. We use Bayesian logic to deduce the most probable encoder on the basis of a feature vector that can be extracted from arbitrary MP3 files. All appropriate features used for the classification are discussed and example results for sets of test data from 20 different codecs are given. Possible applications include advances in information hiding, increases in the reliability of steganographic attacks, and inferences about the origin of MP3 files for forensic purpose. We demonstrate that a pre-classification of MP3 encoders reduces the false alarm rate for a steganographic detection method. Implications for the generalisability of the proposed scheme to other file formats are addressed.