论文信息 - Sparse Bilinear Logistic Regression

Sparse Bilinear Logistic Regression

In this paper, we introduce the concept of sparse bilinear logistic regression for decision problems involving explanatory variables that are two-dimensional matrices. Such problems are common in computer vision, brain-computer interfaces, style/content factorization, and parallel factor analysis. The underlying optimization problem is biconvex; we study its solution and develop an ecient algorithm based on block coordinate descent. We provide a theoretical guarantee for global convergence and estimate the asymptotical convergence rate using the Kurdyka- Lojasiewicz inequality. A range of experiments with simulated and real data demonstrate that sparse bilinear logistic regression outperforms current techniques in several important applications.

[1] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[2] D. Madigan,et al. Sparse Bayesian Classifiers for Text Categorization , 2003 .

[3] Joshua B. Tenenbaum,et al. Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[4] David Madigan,et al. Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[5] A. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[6] P. Sajda,et al. Learning EEG components for discriminating multi-class perceptual decisions , 2011, 2011 5th International IEEE/EMBS Conference on Neural Engineering.

[7] Honglak Lee,et al. Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[8] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[9] Paul Tseng,et al. A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[10] Marie-Françoise Roy,et al. Real algebraic geometry , 1992 .

[11] James Theiler,et al. Online Feature Selection using Grafting , 2003, ICML.

[12] Volker Roth,et al. The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[13] David W. Hosmer,et al. Applied Logistic Regression , 1991 .

[14] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[15] Sophia Ananiadou,et al. Learning string similarity measures for gene/protein name dictionary look-up using logistic regression , 2007, Bioinform..

[16] Richard A. Harshman,et al. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[17] Dmitriy Fradkin,et al. Bayesian Multinomial Logistic Regression for Author Identification , 2005, AIP Conference Proceedings.

[18] Charless C. Fowlkes,et al. Bilinear classifiers for visual recognition , 2009, NIPS.

[19] K. Kurdyka. On gradients of functions definable in o-minimal structures , 1998 .

[20] Adrian S. Lewis,et al. The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..

[21] J. G. Liao,et al. Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[22] Stephen P. Boyd,et al. An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[23] Anil K. Jain,et al. Bayesian learning of sparse classifiers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[24] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[25] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[26] Hédy Attouch,et al. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , 2008, Math. Program..

[27] Z.-Q. Luo,et al. Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[28] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29] Wotao Yin,et al. A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[30] Stephen P. Boyd,et al. A tutorial on geometric programming , 2007, Optimization and Engineering.

[31] Bastian Goldlücke,et al. Variational Analysis , 2014, Computer Vision, A Reference Guide.

[32] P. Tseng. Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[33] Mário A. T. Figueiredo. Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[34] J.J. Vidal,et al. Real-time detection of brain events in EEG , 1977, Proceedings of the IEEE.

[35] Lawrence Carin,et al. Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36] S. Łojasiewicz. Sur la géométrie semi- et sous- analytique , 1993 .

[37] Joshua Goodman,et al. Exponential Priors for Maximum Entropy Models , 2004, NAACL.

[38] Wotao Yin,et al. A Fast Hybrid Algorithm for Large-Scale l1-Regularized Logistic Regression , 2010, J. Mach. Learn. Res..

[39] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[40] Lucas C. Parra,et al. Recipes for the linear analysis of EEG , 2005, NeuroImage.

[41] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.

[42] Lucas C. Parra,et al. Bilinear Discriminant Component Analysis , 2007, J. Mach. Learn. Res..

[43] Robert Tibshirani,et al. 1-norm Support Vector Machines , 2003, NIPS.