Natural Image Statistics and Low-Complexity Feature Selection

Low-complexity feature selection is analyzed in the context of visual recognition. It is hypothesized that high-order dependences of bandpass features contain little information for discrimination of natural images. This hypothesis is characterized formally by the introduction of the concepts of conjunctive interference and decomposability order of a feature set. Necessary and sufficient conditions for the feasibility of low-complexity feature selection are then derived in terms of these concepts. It is shown that the intrinsic complexity of feature selection is determined by the decomposability order of the feature set and not its dimension. Feature selection algorithms are then derived for all levels of complexity and are shown to be approximated by existing information-theoretic methods, which they consistently outperform. The new algorithms are also used to objectively test the hypothesis of low decomposability order through comparison of classification performance. It is shown that, for image classification, the gain of modeling feature dependencies has strongly diminishing returns: best results are obtained under the assumption of decomposability order 1. This suggests a generic law for bandpass features extracted from natural images: that the effect, on the dependence of any two features, of observing any other feature is constant across image classes.

[1]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[2]  J. Wolfe,et al.  What attributes guide the deployment of visual attention and how do they do it? , 2004, Nature Reviews Neuroscience.

[3]  Shimon Ullman,et al.  Object recognition with informative features and linear classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Nuno Vasconcelos Feature selection by maximum marginal diversity: optimality and implications for visual recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[5]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[6]  Gerasimos Potamianos,et al.  Mutual information based visual feature selection for lipreading , 2004, INTERSPEECH.

[7]  Young-Woo Seo,et al.  Feature Selection for Extracting Semantically Rich Words , 2004 .

[8]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  M K Markey,et al.  Application of the mutual information criterion for feature selection in computer-aided diagnosis. , 2001, Medical physics.

[10]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Christoph Schnörr,et al.  Natural Image Statistics for Natural Image Segmentation , 2005, International Journal of Computer Vision.

[12]  Dale Purves,et al.  Natural scene statistics as the universal basis of color context effects , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Nuno Vasconcelos,et al.  Feature Selection by Maximum Marginal Diversity , 2002, NIPS.

[14]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[15]  Pierre Beauseroy,et al.  Mutual information-based feature extraction on the time-frequency plane , 2002, IEEE Trans. Signal Process..

[16]  G. Barrows,et al.  A mutual information measure for feature selection with application to pulse classification , 1996, Proceedings of Third International Symposium on Time-Frequency and Time-Scale Analysis (TFTS-96).

[17]  D. Ruderman,et al.  INDEPENDENT COMPONENT ANALYSIS OF NATURAL IMAGE SEQUENCES YIELDS SPATIOTEMPORAL FILTERS SIMILAR TO SIMPLE CELLS IN PRIMARY VISUAL CORTEX , 1998 .

[18]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[19]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[20]  Mostefa Mesbah,et al.  An optimal feature set for seizure detection systems for newborn EEG signals , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[21]  Pierre Moulin,et al.  Analysis of Multiresolution Image Denoising Schemes Using Generalized Gaussian and Complexity Priors , 1999, IEEE Trans. Inf. Theory.

[22]  Minh N. Do,et al.  Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance , 2002, IEEE Trans. Image Process..

[23]  Assaf Zomet,et al.  Separating reflections from a single image using local features , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[24]  Harry Shum,et al.  Statistical Learning of Multi-view Face Detection , 2002, ECCV.

[25]  Nuno Vasconcelos,et al.  Scalable discriminant feature selection for image retrieval and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Eero P. Simoncelli,et al.  On Advances in Statistical Modeling of Natural Images , 2004, Journal of Mathematical Imaging and Vision.

[27]  Martin Vetterli,et al.  Adaptive wavelet thresholding for image denoising and compression , 2000, IEEE Trans. Image Process..

[28]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[29]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[30]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[31]  Dimitrios Gunopulos,et al.  Feature selection for the naive bayesian classifier using decision trees , 2003, Appl. Artif. Intell..

[32]  Martin J. Wainwright,et al.  Image denoising using scale mixtures of Gaussians in the wavelet domain , 2003, IEEE Trans. Image Process..

[33]  D. Ruderman,et al.  Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[34]  D. Ruderman The statistics of natural images , 1994 .

[35]  Gang Wang,et al.  Feature selection with conditional mutual information maximin in text categorization , 2004, CIKM '04.

[36]  Jeremy M Wolfe,et al.  Modeling the role of parallel processing in visual search , 1990, Cognitive Psychology.

[37]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[38]  Nuno Vasconcelos,et al.  Some relationships between minimum Bayes error and information theoretical feature extraction , 2005, SPIE Defense + Commercial Sensing.

[39]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[40]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[41]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[42]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[43]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[44]  A Treisman,et al.  Feature analysis in early vision: evidence from search asymmetries. , 1988, Psychological review.

[45]  Tommy W. S. Chow,et al.  Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.

[46]  Eero P. Simoncelli,et al.  Image compression via joint statistical characterization in the wavelet domain , 1999, IEEE Trans. Image Process..

[47]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[48]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[49]  Antonio Torralba,et al.  Depth Estimation from Image Structure , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  J. Wolfe,et al.  Guided Search 2.0 A revised model of visual search , 1994, Psychonomic bulletin & review.

[51]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[52]  Gustavo Carneiro,et al.  What Is the Role of Independence for Visual Recognition? , 2002, ECCV.

[53]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[54]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[55]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[56]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[57]  David Mumford,et al.  Statistics of natural images and models , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[58]  H Barlow,et al.  Redundancy reduction revisited , 2001, Network.

[59]  Nariman Farvardin,et al.  Optimum quantizer performance for a class of non-Gaussian memoryless sources , 1984, IEEE Trans. Inf. Theory.

[60]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[61]  Assaf Zomet,et al.  Learning how to inpaint from global image statistics , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[62]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[63]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[64]  Yair Weiss,et al.  Deriving intrinsic images from image sequences , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[65]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[66]  Jan M. Van Campenhout,et al.  On the Possible Orderings in the Measurement Selection Problem , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[67]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[68]  Eero P. Simoncelli,et al.  Natural signal statistics and sensory gain control , 2001, Nature Neuroscience.

[69]  David G. Stork,et al.  Pattern Classification , 1973 .

[70]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[71]  F. Attneave Some informational aspects of visual perception. , 1954, Psychological review.

[72]  R. R. Clarke Transform coding of images , 1985 .

[73]  Nuno Vasconcelos,et al.  Discriminant Saliency for Visual Recognition from Cluttered Scenes , 2004, NIPS.

[74]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[75]  A. Pouget,et al.  Reading population codes: a neural implementation of ideal observers , 1999, Nature Neuroscience.

[76]  Michael J. Black,et al.  Fields of Experts: a framework for learning image priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[77]  A. Treisman,et al.  Conjunction search revisited. , 1990, Journal of experimental psychology. Human perception and performance.

[78]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.