Feature selection for steganalysis using the Mahalanobis distance

Steganalysis is used to detect hidden content in innocuous images. Many successful steganalysis algorithms use a large number of features relative to the size of the training set and suffer from a "curse of dimensionality": large number of feature values relative to training data size. High dimensionality of the feature space can reduce classification accuracy, obscure important features for classification, and increase computational complexity. This paper presents a filter-type feature selection algorithm that selects reduced feature sets using the Mahalanobis distance measure, and develops classifiers from the sets. The experiment is applied to a well-known JPEG steganalyzer, and shows that using our approach, reduced-feature steganalyzers can be obtained that perform as well as the original steganalyzer. The steganalyzer is that of Pevn´y et al. (SPIE, 2007) that combines DCT-based feature values and calibrated Markov features. Five embedding algorithms are used. Our results demonstrate that as few as 10-60 features at various levels of embedding can be used to create a classifier that gives comparable results to the full suite of 274 features.

[1]  Jessica J. Fridrich,et al.  Feature-Based Steganalysis for JPEG Images and Its Implications for Future Design of Steganographic Schemes , 2004, Information Hiding.

[2]  Wei-Chien Chang On using Principal Components before Separating a Mixture of Two Multivariate Normal Distributions , 1983 .

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Christian Cachin,et al.  An information-theoretic model for steganography , 1998, Inf. Comput..

[5]  Guorong Xuan,et al.  Feature Selection based on the Bhattacharyya Distance , 2006, ICPR.

[6]  Ingemar J. Cox,et al.  Digital Watermarking and Steganography , 2014 .

[7]  Fenlin Liu,et al.  A review on blind detection for image steganography , 2008, Signal Process..

[8]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[9]  A. Caprihan,et al.  Application of principal component analysis to distinguish patients with schizophrenia from healthy controls based on fractional anisotropy measurements , 2008, NeuroImage.

[10]  Danh V. Nguyen,et al.  On partial least squares dimension reduction for microarray-based classification: a simulation study , 2004, Comput. Stat. Data Anal..

[11]  R. Hoffmann,et al.  Geographical and Interspecific Cranial Variation in Big-Eared Ground Squirrels (Spermophilus): A Multivariate Study , 1975 .

[12]  Siwei Lyu,et al.  Steganalysis using higher-order image statistics , 2006, IEEE Transactions on Information Forensics and Security.

[13]  Amaury Lendasse,et al.  A Feature Selection Methodology for Steganalysis , 2006, MRCS.

[14]  Susanto Rahardja,et al.  Steganalysis of Binary Cartoon Image using Distortion Measure , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[15]  Ying Wang,et al.  Optimized Feature Extraction for Learning-Based Image Steganalysis , 2007, IEEE Transactions on Information Forensics and Security.

[16]  Andreas Westfeld,et al.  F5-A Steganographic Algorithm , 2001, Information Hiding.

[17]  Bryan F. J. Manly,et al.  Multivariate Statistical Methods: A Primer, Third Edition , 1994 .

[18]  Yun Q. Shi,et al.  A Markov Process Based Approach to Effective Attacking JPEG Steganography , 2006, Information Hiding.

[19]  B. Manly Multivariate Statistical Methods : A Primer , 1986 .

[20]  Nasir D. Memon,et al.  Steganalysis of watermarking techniques using image quality metrics , 2001, IS&T/SPIE Electronic Imaging.

[21]  Yun Q. Shi,et al.  JPEG image steganalysis utilizing both intrablock and interblock correlations , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[22]  E. John,et al.  Neurometrics: computer-assisted differential diagnosis of brain dysfunctions. , 1988, Science.

[23]  William A. Pearlman,et al.  Steganalysis of additive-noise modelable information hiding , 2003, IS&T/SPIE Electronic Imaging.

[24]  Amaury Lendasse,et al.  Reliable Steganalysis Using a Minimum Set of Samples and Features , 2009, EURASIP J. Inf. Secur..

[25]  Xuyu Xiang,et al.  Principal feature selection and fusion method for image steganalysis , 2009, J. Electronic Imaging.

[26]  Jessica J. Fridrich,et al.  Blind Statistical Steganalysis of Additive Steganography Using Wavelet Higher Order Statistics , 2005, Communications and Multimedia Security.

[27]  Qingzhong Liu,et al.  Feature mining and pattern classification for steganalysis of LSB matching steganography in grayscale images , 2008, Pattern Recognit..

[28]  Bin Li,et al.  Universal JPEG steganalysis based on microscopic and macroscopic calibration , 2008, 2008 15th IEEE International Conference on Image Processing.

[29]  Andrew D. Ker,et al.  Feature reduction and payload location with WAM steganalysis , 2009, Electronic Imaging.

[30]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[32]  Niels Provos,et al.  Defending Against Statistical Steganalysis , 2001, USENIX Security Symposium.

[33]  Rangding Wang,et al.  Improvement of TND steganalysis based on image complexity , 2008, 2008 9th International Conference on Signal Processing.

[34]  Ross J. Anderson,et al.  On the limits of steganography , 1998, IEEE J. Sel. Areas Commun..

[35]  Tomás Pevný,et al.  Merging Markov and DCT features for multi-class JPEG steganalysis , 2007, Electronic Imaging.