Joint Markov Blankets in Feature Sets Extracted from Wavelet Packet Decompositions

Since two decades, wavelet packet decompositions have been shown effective as a generic approach to feature extraction from time series and images for the prediction of a target variable. Redundancies exist between the wavelet coefficients and between the energy features that are derived from the wavelet coefficients. We assess these redundancies in wavelet packet decompositions by means of the Markov blanket filtering theory. We introduce the concept of joint Markov blankets. It is shown that joint Markov blankets are a natural extension of Markov blankets, which are defined for single features, to a set of features. We show that these joint Markov blankets exist in feature sets consisting of the wavelet coefficients. Furthermore, we prove that wavelet energy features from the highest frequency resolution level form a joint Markov blanket for all other wavelet energy features. The joint Markov blanket theory indicates that one can expect an increase of classification accuracy with the increase of the frequency resolution level of the energy features.

[1]  G. MallatS. A Theory for Multiresolution Signal Decomposition , 1989 .

[2]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[3]  David Burshtein,et al.  Support Vector Machine Training for Improved Hidden Markov Modeling , 2008, IEEE Transactions on Signal Processing.

[4]  Jesper Tegnér,et al.  Towards scalable and data efficient learning of Markov boundaries , 2007, Int. J. Approx. Reason..

[5]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[6]  Ronald R. Coifman,et al.  Extraction of geological information from acoustic well-logging waveforms using time-frequency wavelets , 1997 .

[7]  Ravi Mazumdar,et al.  On the correlation structure of the wavelet coefficients of fractional Brownian motion , 1994, IEEE Trans. Inf. Theory.

[8]  Ke Huang,et al.  Wavelet Feature Selection for Image Classification , 2008, IEEE Transactions on Image Processing.

[9]  S. Mallat A wavelet tour of signal processing , 1998 .

[10]  P. Laguna,et al.  Signal Processing , 2002, Yearbook of Medical Informatics.

[11]  Neri Merhav,et al.  Relations between entropy and error probability , 1994, IEEE Trans. Inf. Theory.

[12]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Marc M. Van Hulle,et al.  Increasing and Decreasing Returns and Losses in Mutual Information Feature Subset Selection , 2010, Entropy.

[14]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[15]  Ronald R. Coifman,et al.  Local discriminant bases and their applications , 1995, Journal of Mathematical Imaging and Vision.

[16]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[17]  Marc M. Van Hulle,et al.  Speeding Up Feature Subset Selection Through Mutual Information Relevance Filtering , 2007, PKDD.

[18]  Chee Keong Kwoh,et al.  A Feature Subset Selection Method Based On High-Dimensional Mutual Information , 2011, Entropy.

[19]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Šarūnas Raudys,et al.  Statistical and Neural Classifiers: An Integrated Approach to Design , 2012 .

[21]  Sarunas Raudys,et al.  On Dimensionality, Sample Size, and Classification Error of Nonparametric Linear Classification Algorithms , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Jian Fan,et al.  Texture Classification by Wavelet Packet Signatures , 1993, MVA.

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Marc M. Van Hulle,et al.  Wavelet Packet Decomposition for the Identification of Corrosion Type from Acoustic Emission Signals , 2009, Int. J. Wavelets Multiresolution Inf. Process..

[25]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[26]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[27]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[28]  Philip M. Lewis,et al.  The characteristic selection problem in recognition systems , 1962, IRE Trans. Inf. Theory.

[29]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[30]  A.H. Tewfik,et al.  Correlation structure of the discrete wavelet coefficients of fractional Brownian motion , 1992, IEEE Trans. Inf. Theory.

[31]  Marcel J. T. Reinders,et al.  Artifacts of Markov blanket filtering based on discretized features in small sample size applications , 2006, Pattern Recognit. Lett..

[32]  Colas Schretter,et al.  Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity , 2008, IEEE Journal of Selected Topics in Signal Processing.

[33]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[34]  Jesper Tegnér,et al.  Consistent Feature Selection for Pattern Recognition in Polynomial Time , 2007, J. Mach. Learn. Res..

[35]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[36]  Marimuthu Palaniswami,et al.  Support Vector Machines for Automated Recognition of Obstructive Sleep Apnea Syndrome From ECG Recordings , 2009, IEEE Transactions on Information Technology in Biomedicine.

[37]  Lipo Wang,et al.  Data Mining With Computational Intelligence , 2006, IEEE Transactions on Neural Networks.

[38]  Marc M. Van Hulle,et al.  Information Theory Filters for Wavelet Packet Coefficient Selection with Application to Corrosion Type Identification from Acoustic Emission Signals , 2011, Sensors.

[39]  Ronald R. Coifman,et al.  Discriminant feature extraction using empirical probability density estimation and a local basis library , 2002, Pattern Recognit..

[40]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[41]  Ke Huang,et al.  Information-theoretic wavelet packet subband selection for texture classification , 2006, Signal Process..

[42]  Gert Van Dijck,et al.  Information Theoretic Approach to Feature Selection and Redundancy Assessment (Informatietheoretische benadering voor selectie van kenmerken en inschatting van redundantie) ; Information Theoretic Approach to Feature Selection and Redundancy Assessment , 2008 .

[43]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Marc M. Van Hulle,et al.  Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis , 2006, ICANN.

[45]  Alex Aussem,et al.  A novel Markov boundary based feature subset selection algorithm , 2010, Neurocomputing.

[46]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[47]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[48]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[49]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.