How to find relevant training data: A paired bootstrapping approach to blind steganalysis

Today, support vector machines (SVMs) seem to be the classifier of choice in blind steganalysis. This approach needs two steps: first, a training phase determines a separating hyperplane that distinguishes between cover and stego images; second, in a test phase the class membership of an unknown input image is detected using this hyperplane. As in all statistical classifiers, the number of training images is a critical factor: the more images that are used in the training phase, the better the steganalysis performance will be in the test phase, however at the price of a greatly increased training time of the SVM algorithm. Interestingly, only a few training data, the support vectors, determine the separating hyperplane of the SVM. In this paper, we introduce a paired bootstrapping approach specifically developed for the steganalysis scenario that selects likely candidates for support vectors. The resulting training set is considerably smaller, without a significant loss of steganalysis performance.

[1]  Siwei Lyu,et al.  Steganalysis using higher-order image statistics , 2006, IEEE Transactions on Information Forensics and Security.

[2]  Tomás Pevný,et al.  "Break Our Steganographic System": The Ins and Outs of Organizing BOSS , 2011, Information Hiding.

[3]  Jan Kodovský Technical report , August 2011 Ensemble classification in steganalysis – Cross-validation and AdaBoost Ensemble classification in steganalysis – Cross-validation and AdaBoost , 2011 .

[4]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[5]  Matthias O. Franz,et al.  Simple algorithmic modifications for improving blind steganalysis performance , 2010, MM&Sec '10.

[6]  Matthias O. Franz,et al.  Single Band Statistics and Steganalysis Performance , 2010, 2010 Sixth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[7]  Joachim M. Buhmann,et al.  2010 International Conference on Pattern Recognition The binormal assumption on precision-recall curves , 2022 .

[8]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[9]  Gunnar Rätsch,et al.  Advanced Lectures on Machine Learning , 2004, Lecture Notes in Computer Science.

[10]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[11]  Tomás Pevný,et al.  Steganalysis by subtractive pixel adjacency matrix , 2010, IEEE Trans. Inf. Forensics Secur..

[12]  Jessica J. Fridrich,et al.  Breaking HUGO - The Process Discovery , 2011, Information Hiding.

[13]  S. Geisser,et al.  A Predictive Approach to Model Selection , 1979 .

[14]  Eero P. Simoncelli,et al.  Image compression via joint statistical characterization in the wavelet domain , 1999, IEEE Trans. Image Process..

[15]  Tomás Pevný,et al.  Using High-Dimensional Image Models to Perform Highly Undetectable Steganography , 2010, Information Hiding.

[16]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[17]  Andrew D. Ker,et al.  Feature reduction and payload location with WAM steganalysis , 2009, Electronic Imaging.

[18]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[19]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[20]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.