Steganalysis in high dimensions: fusing classifiers built on random subspaces

By working with high-dimensional representations of covers, modern steganographic methods are capable of preserving a large number of complex dependencies among individual cover elements and thus avoid detection using current best steganalyzers. Inevitably, steganalysis needs to start using high-dimensional feature sets as well. This brings two key problems - construction of good high-dimensional features and machine learning that scales well with respect to dimensionality. Depending on the classifier, high dimensionality may lead to problems with the lack of training data, infeasibly high complexity of training, degradation of generalization abilities, lack of robustness to cover source, and saturation of performance below its potential. To address these problems collectively known as the curse of dimensionality, we propose ensemble classifiers as an alternative to the much more complex support vector machines. Based on the character of the media being analyzed, the steganalyst first puts together a high-dimensional set of diverse "prefeatures" selected to capture dependencies among individual cover elements. Then, a family of weak classifiers is built on random subspaces of the prefeature space. The final classifier is constructed by fusing the decisions of individual classifiers. The advantage of this approach is its universality, low complexity, simplicity, and improved performance when compared to classifiers trained on the entire prefeature set. Experiments with the steganographic algorithms nsF5 and HUGO demonstrate the usefulness of this approach over current state of the art.

[1]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[2]  Bernard Chazelle,et al.  Faster dimension reduction , 2010, Commun. ACM.

[3]  Yun Q. Shi,et al.  JPEG image steganalysis utilizing both intrablock and interblock correlations , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[4]  Jessica J. Fridrich,et al.  Calibration revisited , 2009, MM&Sec '09.

[5]  Jessica J. Fridrich,et al.  Feature-Based Steganalysis for JPEG Images and Its Implications for Future Design of Steganographic Schemes , 2004, Information Hiding.

[6]  Tomás Pevný,et al.  Statistically undetectable jpeg steganography: dead ends challenges, and opportunities , 2007, MM&Sec.

[7]  Avrim Blum,et al.  Random Projection, Margins, Kernels, and Feature-Selection , 2005, SLSFS.

[8]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[9]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[10]  Tomás Pevný,et al.  Steganalysis by Subtractive Pixel Adjacency Matrix , 2009, IEEE Transactions on Information Forensics and Security.

[11]  Geoffrey J. Gordon,et al.  The support vector decomposition machine , 2006, ICML.

[12]  Mauro Barni,et al.  A Comparative Study of ±1 Steganalyzers , 2008 .

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Yun Q. Shi,et al.  A Markov Process Based Approach to Effective Attacking JPEG Steganography , 2006, Information Hiding.

[15]  David G. Stork,et al.  Pattern Classification , 1973 .

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Tomás Pevný,et al.  Merging Markov and DCT features for multi-class JPEG steganalysis , 2007, Electronic Imaging.

[18]  Anindya Sarkar,et al.  Further study on YASS: steganography based on randomized embedding to resist blind steganalysis , 2008, Electronic Imaging.

[19]  Jason Weston,et al.  Embedded Methods , 2006, Feature Extraction.

[20]  Adele Cutler,et al.  PERT – Perfect Random Tree Ensembles , 2001 .

[21]  Jessica J. Fridrich,et al.  Steganalysis of Content-Adaptive Steganography in Spatial Domain , 2011, Information Hiding.

[22]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[23]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[24]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[25]  Jessica J. Fridrich,et al.  New blind steganalysis and its implications , 2006, Electronic Imaging.

[26]  Tomás Pevný,et al.  Modern steganalysis can detect YASS , 2010, Electronic Imaging.

[27]  Tomás Pevný,et al.  Using High-Dimensional Image Models to Perform Highly Undetectable Steganography , 2010, Information Hiding.