Spectral feature projections that maximize Shannon mutual information with class labels

[1]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[2]  G. A. Barnard,et al.  Transmission of Information: A Statistical Theory of Communications. , 1961 .

[3]  Martin E. Hellman,et al.  Probability of error, equivocation, and the Chernoff bound , 1970, IEEE Trans. Inf. Theory.

[4]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[5]  Oldrich A Vasicek,et al.  A Test for Normality Based on Sample Entropy , 1976 .

[6]  Robert P. W. Duin,et al.  On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions , 1976, IEEE Transactions on Computers.

[7]  H. Weinert Reproducing kernel Hilbert spaces: Applications in statistical signal processing , 1982 .

[8]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[9]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[10]  Gene H. Golub,et al.  Matrix computations , 1983 .

[11]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[12]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[13]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Jonathan A. Marshall,et al.  An introduction to neural and electronic networks: Edited by Steven F. Zornetzer, Joel L. Davis, and Clifford Lau, Academic Press, San Diego, CA: 1990, hardcover $99.50, paperback $44.95, 493 pp., ISBN 0-12-781881-2 , 1992 .

[16]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[17]  A. S. Weigend,et al.  Selecting Input Variables Using Mutual Information and Nonparemetric Density Estimation , 1994 .

[18]  Joydeep Ghosh,et al.  Linear feature extractors based on mutual information , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[19]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[20]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[21]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[22]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[23]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[24]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[25]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[26]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[27]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[28]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[29]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[30]  Deniz Erdogmus,et al.  An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems , 2002, IEEE Trans. Signal Process..

[31]  Deniz Erdoğmuş INFORMATION THEORETIC LEARNING: RENYI'S ENTROPY AND ITS APPLICATIONS TO ADAPTIVE SYSTEM TRAINING , 2002 .

[32]  Alfred O. Hero,et al.  Applications of entropic spanning graphs , 2002, IEEE Signal Process. Mag..

[33]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[34]  John W. Fisher,et al.  ICA Using Spacings Estimates of Entropy , 2003, J. Mach. Learn. Res..

[35]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Deniz Erdogmus,et al.  Lower and Upper Bounds for Misclassification Probability Based on Renyi's Information , 2004, J. VLSI Signal Process..

[37]  Matthias W. Seeger,et al.  Gaussian Processes For Machine Learning , 2004, Int. J. Neural Syst..

[38]  Nicol N. Schraudolph,et al.  Gradient-based manipulation of nonparametric entropy estimates , 2004, IEEE Transactions on Neural Networks.

[39]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  Alfred O. Hero,et al.  Classification constrained dimensionality reduction , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[41]  Baver Okutmustur Reproducing kernel Hilbert spaces , 2005 .