Ensemble Classifiers for Steganalysis of Digital Media

Today, the most accurate steganalysis methods for digital media are built as supervised classifiers on feature vectors extracted from the media. The tool of choice for the machine learning seems to be the support vector machine (SVM). In this paper, we propose an alternative and well-known machine learning tool-ensemble classifiers implemented as random forests-and argue that they are ideally suited for steganalysis. Ensemble classifiers scale much more favorably w.r.t. the number of training examples and the feature dimensionality with performance comparable to the much more complex SVMs. The significantly lower training complexity opens up the possibility for the steganalyst to work with rich (high-dimensional) cover models and train on larger training sets-two key elements that appear necessary to reliably detect modern steganographic algorithms. Ensemble classification is portrayed here as a powerful developer tool that allows fast construction of steganography detectors with markedly improved detection accuracy across a wide range of embedding methods. The power of the proposed framework is demonstrated on three steganographic methods that hide messages in JPEG images.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[7]  Siwei Lyu,et al.  Detecting Hidden Messages Using Higher-Order Statistics and Support Vector Machines , 2002, Information Hiding.

[8]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[9]  Phil Sallee,et al.  Model-Based Steganography , 2003, IWDW.

[10]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[11]  Tamara G. Kolda,et al.  Optimization by Direct Search: New Perspectives on Some Classical and Modern Methods , 2003, SIAM Rev..

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  Jessica J. Fridrich,et al.  Feature-Based Steganalysis for JPEG Images and Its Implications for Future Design of Steganographic Schemes , 2004, Information Hiding.

[16]  B. S. Manjunath,et al.  Robust image-adaptive data hiding using erasure and error correction , 2004, IEEE Transactions on Image Processing.

[17]  Nasir D. Memon,et al.  Image Steganalysis with Binary Similarity Measures , 2005, EURASIP J. Adv. Signal Process..

[18]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[19]  Jason Weston,et al.  Embedded Methods , 2006, Feature Extraction.

[20]  Geoffrey J. Gordon,et al.  The support vector decomposition machine , 2006, ICML.

[21]  Jessica J. Fridrich,et al.  New blind steganalysis and its implications , 2006, Electronic Imaging.

[22]  Yun Q. Shi,et al.  A Markov Process Based Approach to Effective Attacking JPEG Steganography , 2006, Information Hiding.

[23]  Nasir D. Memon,et al.  Improving Steganalysis by Fusion Techniques: A Case Study with Image Steganography , 2006, Trans. Data Hiding Multim. Secur..

[24]  Tomás Pevný,et al.  Merging Markov and DCT features for multi-class JPEG steganalysis , 2007, Electronic Imaging.

[25]  Anindya Sarkar,et al.  YASS: Yet Another Steganographic Scheme That Resists Blind Steganalysis , 2007, Information Hiding.

[26]  Ismail Avcibas,et al.  Steganalytic Features for JPEG Compression-Based Perturbed Quantization , 2007, IEEE Signal Processing Letters.

[27]  Tomás Pevný,et al.  Statistically undetectable jpeg steganography: dead ends challenges, and opportunities , 2007, MM&Sec.

[28]  Jean-Philippe Thiran,et al.  Information Theoretic Combination of Classifiers with Application to AdaBoost , 2007, MCS.

[29]  James J. Chen,et al.  Classification by ensembles from random partitions of high-dimensional data , 2007, Comput. Stat. Data Anal..

[30]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[31]  Yun Q. Shi,et al.  JPEG image steganalysis utilizing both intrablock and interblock correlations , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[32]  Mohammad Hassan Moradi,et al.  A Hybrid Random Subspace Classifier Fusion Approach for Protein Mass Spectra Classification , 2008, EvoBIO.

[33]  Anindya Sarkar,et al.  Further study on YASS: steganography based on randomized embedding to resist blind steganalysis , 2008, Electronic Imaging.

[34]  Gavin Brown An Information Theoretic Perspective on Multiple Classifier Systems , 2009, MCS.

[35]  Gürsel Serpen,et al.  Classification in High-Dimensional Feature Spaces: Random Subsample Ensemble , 2009, 2009 International Conference on Machine Learning and Applications.

[36]  Rainer Böhme,et al.  Improved statistical steganalysis using models of heterogeneous cover signals , 2009 .

[37]  Jessica J. Fridrich,et al.  Calibration revisited , 2009, MM&Sec '09.

[38]  Juan José Rodríguez Diez,et al.  Random Subspace Ensembles for fMRI Classification , 2010, IEEE Transactions on Medical Imaging.

[39]  Tomás Pevný,et al.  Using High-Dimensional Image Models to Perform Highly Undetectable Steganography , 2010, Information Hiding.

[40]  Tomás Pevný,et al.  Steganalysis by Subtractive Pixel Adjacency Matrix , 2009, IEEE Transactions on Information Forensics and Security.

[41]  Jessica J. Fridrich,et al.  Gibbs Construction in Steganography , 2010, IEEE Transactions on Information Forensics and Security.

[42]  Tomás Pevný,et al.  Modern steganalysis can detect YASS , 2010, Electronic Imaging.

[43]  Matthias O. Franz,et al.  Simple algorithmic modifications for improving blind steganalysis performance , 2010, MM&Sec '10.

[44]  Jiwu Huang,et al.  Edge Adaptive Image Steganography Based on LSB Matching Revisited , 2010, IEEE Transactions on Information Forensics and Security.

[45]  Jan Kodovský Technical report , August 2011 Ensemble classification in steganalysis – Cross-validation and AdaBoost Ensemble classification in steganalysis – Cross-validation and AdaBoost , 2011 .

[46]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[47]  Jessica J. Fridrich,et al.  Breaking HUGO - The Process Discovery , 2011, Information Hiding.

[48]  Fatih Kurugollu,et al.  A New Methodology in Steganalysis: Breaking Highly Undetectable Steganograpy (HUGO) , 2011, Information Hiding.

[49]  Jan Kodovský On dangers of cross-validation in steganalysis On dangers of cross-validation in steganalysis , 2011 .

[50]  Jessica J. Fridrich,et al.  Steganalysis of Content-Adaptive Steganography in Spatial Domain , 2011, Information Hiding.

[51]  Jessica J. Fridrich,et al.  Steganalysis in high dimensions: fusing classifiers built on random subspaces , 2011, Electronic Imaging.