Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
暂无分享,去创建一个
[1] Martin J. Wainwright,et al. Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.
[2] Tong Zhang,et al. On the Consistency of Feature Selection using Greedy Least Squares Regression , 2009, J. Mach. Learn. Res..
[3] Dean P. Foster,et al. Feature Selection using Multiple Streams , 2010, AISTATS.
[4] Martha Palmer,et al. Class-Based Construction of a Verb Lexicon , 2000, AAAI/IAAI.
[5] N. Meinshausen,et al. High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.
[6] Peter Grünwald,et al. A tutorial introduction to the minimum description length principle , 2004, ArXiv.
[7] A. Rinaldo,et al. On the asymptotic properties of the group lasso estimator for linear models , 2008 .
[8] Tong Zhang,et al. Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models , 2008, NIPS.
[9] H. Akaike,et al. Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .
[10] P. Zhao,et al. Grouped and Hierarchical Model Selection through Composite Absolute Penalties , 2007 .
[11] Barbara B. Levin,et al. English verb classes and alternations , 1993 .
[12] Martha Palmer,et al. Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .
[13] P. Bühlmann,et al. The group lasso for logistic regression , 2008 .
[14] Harrison H. Zhou,et al. Model selection and sharp asymptotic minimaxity , 2013 .
[15] Martha Palmer,et al. An Empirical Study of the Behavior of Active Learning for Word Sense Disambiguation , 2006, NAACL.
[16] R. F.,et al. Mathematical Statistics , 1944, Nature.
[17] Joel A. Tropp,et al. Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.
[18] Tong Zhang,et al. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..
[19] Peng Zhao,et al. On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..
[20] Tony Jebara,et al. Multi-task feature and kernel selection for SVMs , 2004, ICML.
[21] 김두식,et al. English Verb Classes and Alternations , 2006 .
[22] Stephen J. Wright,et al. Simultaneous Variable Selection , 2005, Technometrics.
[23] Jean-Philippe Vert,et al. Group lasso with overlap and graph lasso , 2009, ICML '09.
[24] Martha Palmer,et al. Towards Robust High Performance Word Sense Disambiguation of English Verbs Using Rich Linguistic Features , 2005, IJCNLP.
[25] Mitchell P. Marcus,et al. OntoNotes: The 90% Solution , 2006, NAACL.
[26] Han Liu,et al. On the ℓ 1 -ℓ q Regularized Regression , 2008 .
[27] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..
[28] P. Grünwald. The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .
[29] Lyle H. Ungar,et al. Transfer Learning, Feature Selection and Word Sense Disambiguation , 2009, ACL/IJCNLP.
[30] CRAIG M. PEASE. In Defense of N > 1 , 2005 .
[31] Junzhou Huang,et al. Learning with structured sparsity , 2009, ICML '09.
[32] Dean P. Foster,et al. Multi-task Feature Selection Using the Multiple Inclusion Criterion (MIC) , 2009, ECML/PKDD.
[33] Jorma Rissanen,et al. Hypothesis Selection and Testing by the MDL Principle , 1999, Comput. J..
[34] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[35] Rajat Raina,et al. Constructing informative priors using transfer learning , 2006, ICML.
[36] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[37] David Yarowsky,et al. Modeling Consensus: Classifier Combination for Word Sense Disambiguation , 2002, EMNLP.
[38] George A. Miller,et al. Introduction to WordNet: An On-line Lexical Database , 1990 .
[39] Vasileios Kandylas,et al. Finding cohesive clusters for analyzing knowledge communities , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).
[40] Karim Lounici. Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators , 2008, 0801.4610.
[41] Andrew R. Barron,et al. Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.
[42] Peter Secretan. Learning , 1965, Mental Health.
[43] Stuart L Schreiber,et al. Genetic basis of individual differences in the response to small-molecule drugs in yeast , 2007, Nature Genetics.
[44] Jorma Rissanen,et al. The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.
[45] Dean P. Foster,et al. The risk inflation criterion for multiple regression , 1994 .
[46] Massimiliano Pontil,et al. Convex multi-task feature learning , 2008, Machine Learning.
[47] Rie Kubota Ando,et al. Applying Alternating Structure Optimization to Word Sense Disambiguation , 2006, CoNLL.
[48] Yudong D. He,et al. Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.
[49] Ben Taskar,et al. Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..
[50] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[51] Tong Zhang,et al. On the Convergence of MDL Density Estimation , 2004, COLT.
[52] M. Daly,et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.
[53] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .
[54] Rich Caruana,et al. Multitask Learning , 1997, Machine-mediated learning.
[55] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.
[56] Shuheng Zhou,et al. Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation , 2009, NIPS.
[57] Daphne Koller,et al. Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.
[58] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[59] Dean P. Foster,et al. Efficient Feature Selection in the Presence of Multiple Feature Classes , 2008, 2008 Eighth IEEE International Conference on Data Mining.
[60] J. Rissanen. A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .
[61] Dekang Lin,et al. Review of WordNet: an electronic lexical database by Christiane Fellbaum. The MIT Press 1998. , 1999 .
[62] M. Yuan,et al. Model selection and estimation in regression with grouped variables , 2006 .
[63] Francis R. Bach,et al. Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..
[64] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.