Study of cover source mismatch in steganalysis and ways to mitigate its impact

When a steganalysis detector trained on one cover source is applied to images from a different source, generally the detection error increases due to the mismatch between both sources. In steganography, this situation is recognized as the so-called cover source mismatch (CSM). The drop in detection accuracy depends on many factors, including the properties of both sources, the detector construction, the feature space used to represent the covers, and the steganographic algorithm. Although well recognized as the single most important factor negatively affecting the performance of steganalyzers in practice, the CSM received surprisingly little attention from researchers. One of the reasons for this is the diversity with which the CSM can manifest. On a series of experiments in the spatial and JPEG domains, we refute some of the common misconceptions that the severity of the CSM is tied to the feature dimensionality or their “fragility.” The CSM impact on detection appears too difficult to predict due to the effect of complex dependencies among the features. We also investigate ways to mitigate the negative effect of the CSM using simple measures, such as by enlarging the diversity of the training set (training on a mixture of sources) and by employing a bank of detectors trained on multiple different sources and testing on a detector trained on the closest source.

[1]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Tomás Pevný,et al.  Merging Markov and DCT features for multi-class JPEG steganalysis , 2007, Electronic Imaging.

[3]  Bernhard Schölkopf,et al.  Domain Generalization via Invariant Feature Representation , 2013, ICML.

[4]  Fatih Kurugollu,et al.  A New Methodology in Steganalysis: Breaking Highly Undetectable Steganograpy (HUGO) , 2011, Information Hiding.

[5]  Jessica J. Fridrich,et al.  New blind steganalysis and its implications , 2006, Electronic Imaging.

[6]  Tomás Pevný,et al.  Identifying a steganographer in realistic and heterogeneous data sets , 2012, Other Conferences.

[7]  Jessica J. Fridrich,et al.  Practical methods for minimizing embedding impact in steganography , 2007, Electronic Imaging.

[8]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .

[9]  R. Wilcox Introduction to Robust Estimation and Hypothesis Testing , 1997 .

[10]  Rainer Böhme,et al.  Moving steganography and steganalysis from the laboratory into the real world , 2013, IH&MMSec '13.

[11]  Andrew D. Ker,et al.  Steganalysis with mismatched covers: do simple classifiers help? , 2012, MM&Sec '12.

[12]  Jessica J. Fridrich,et al.  Ensemble Classifiers for Steganalysis of Digital Media , 2012, IEEE Transactions on Information Forensics and Security.

[13]  Mauro Barni,et al.  A Comparative Study of ±1 Steganalyzers , 2008 .

[14]  Tomás Pevný,et al.  A mishmash of methods for mitigating the model mismatch mess , 2014, Electronic Imaging.

[15]  Jessica J. Fridrich,et al.  Rich Models for Steganalysis of Digital Images , 2012, IEEE Transactions on Information Forensics and Security.

[16]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Jessica J. Fridrich,et al.  Breaking HUGO - The Process Discovery , 2011, Information Hiding.

[18]  Jessica J. Fridrich,et al.  Steganalysis of JPEG images using rich models , 2012, Other Conferences.

[19]  Tomás Pevný,et al.  "Break Our Steganographic System": The Ins and Outs of Organizing BOSS , 2011, Information Hiding.

[20]  Jessica J. Fridrich,et al.  Calibration revisited , 2009, MM&Sec '09.

[21]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[22]  Jessica J. Fridrich,et al.  Steganalysis in resized images , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.