暂无分享,去创建一个
[1] Kenji Yamanishi,et al. Efficient Computation of Normalized Maximum Likelihood Codes for Gaussian Mixture Models With Its Applications to Clustering , 2013, IEEE Transactions on Information Theory.
[2] A. Barron,et al. Robustly Minimax Codes for Universal Data Compression , 1998 .
[3] Peter L. Bartlett,et al. Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families , 2013, COLT.
[4] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .
[5] P. Grünwald. The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .
[6] Shin Matsushima,et al. Sparse Graphical Modeling via Stochastic Complexity , 2017, SDM.
[7] A. P. Dawid,et al. Present position and potential developments: some personal views , 1984 .
[8] Ryan P. Adams,et al. Non-vacuous Generalization Bounds at the ImageNet Scale: a PAC-Bayesian Compression Approach , 2018, ICLR.
[9] Jorma Rissanen,et al. Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.
[10] Jilles Vreeken,et al. Finding Good Itemsets by Packing Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.
[11] Ivo Grosse,et al. Robust learning of inhomogeneous PMMs , 2014, AISTATS.
[12] Jorma Rissanen,et al. Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.
[13] John Langford,et al. Suboptimal Behavior of Bayes and MDL in Classification Under Misspecification , 2004, COLT.
[14] Jilles Vreeken,et al. Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.
[15] Thijs van Ommen,et al. Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It , 2014, 1412.3730.
[16] Peter Grünwald,et al. Jeffreys versus Shtarkov distributions associated with some natural exponential families , 2010 .
[17] P. Grünwald,et al. Almost the best of three worlds: Risk, consistency and optional stopping for the switch criterion in nested model selection , 2018 .
[18] Kenji Yamanishi,et al. The decomposed normalized maximum likelihood code-length criterion for selecting hierarchical latent variable models , 2019, Data Mining and Knowledge Discovery.
[19] John Langford,et al. PAC-MDL Bounds , 2003, COLT.
[20] Wai Lam,et al. LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..
[21] David Heckerman,et al. A Characterization of the Dirichlet Distribution with Application to Learning Bayesian Networks , 1995, UAI.
[22] P. Grünwald,et al. Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC–BIC dilemma , 2012 .
[23] Kailash Budhathoki,et al. Origo: causal inference by compression , 2016, Knowledge and Information Systems.
[24] Jorma Rissanen,et al. Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.
[25] Peter Grünwald,et al. Safe Probability , 2016, ArXiv.
[26] Mark A. Pitt,et al. Advances in Minimum Description Length: Theory and Applications , 2005 .
[27] Wojciech Szpankowski,et al. Minimax Pointwise Redundancy for Memoryless Models Over Large Alphabets , 2012, IEEE Transactions on Information Theory.
[28] Tomi Silander,et al. Quotient Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures , 2018, AISTATS.
[29] Kazuho Watanabe,et al. Achievability of asymptotic minimax regret by horizon-dependent and horizon-independent strategies , 2015, J. Mach. Learn. Res..
[30] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[31] Jun'ichi Takeuchi,et al. Barron and Cover's Theory in Supervised Learning and its Application to Lasso , 2016, ICML.
[32] Kailash Budhathoki,et al. Origo: causal inference by compression , 2017, Knowledge and Information Systems.
[33] Peter Grünwald,et al. The Safe Bayesian - Learning the Learning Rate via the Mixability Gap , 2012, ALT.
[34] Daniel F. Schmidt,et al. Subset Selection in Linear Regression using Sequentially Normalized Least Squares: Asymptotic Theory , 2016 .
[35] Steven de Rooij,et al. Catching Up Faster in Bayesian Model Selection and Model Averaging , 2007, NIPS.
[36] Danai Koutra,et al. Summarizing and understanding large graphs , 2015, Stat. Anal. Data Min..
[37] Peter Grünwald. Viewing all models as “probabilistic” , 1999, COLT '99.
[38] Jorma Rissanen,et al. Information and Complexity in Statistical Modeling , 2006, ITW.
[39] Kenji Yamanishi,et al. An Upper Bound on Normalized Maximum Likelihood Codes for Gaussian Mixture Models , 2017, ArXiv.
[40] Jason M. Klusowski,et al. Finite-Sample Risk Bounds for Maximum Likelihood Estimation With Arbitrary Penalties , 2018, IEEE Transactions on Information Theory.
[41] L. Pericchi,et al. BAYES FACTORS AND MARGINAL DISTRIBUTIONS IN INVARIANT SITUATIONS , 2016 .
[42] A. Barron,et al. Estimation of mixture models , 1999 .
[43] Kazuho Watanabe,et al. Bayesian properties of normalized maximum likelihood and its fast computation , 2014, 2014 IEEE International Symposium on Information Theory.
[44] J. Rissanen,et al. ON SEQUENTIALLY NORMALIZED MAXIMUM LIKELIHOOD MODELS , 2008 .
[45] Atsushi Suzuki,et al. Exact Calculation of Normalized Maximum Likelihood Code Length Using Fourier Analysis , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).
[46] A. Dawid. The geometry of proper scoring rules , 2007 .
[47] Sumio Watanabe,et al. A widely applicable Bayesian information criterion , 2012, J. Mach. Learn. Res..
[48] Jorma Rissanen,et al. Model selection by sequentially normalized least squares , 2010, J. Multivar. Anal..
[49] A. Barron,et al. THE MDL PRINCIPLE , PENALIZED LIKELIHOODS , AND STATISTICAL RISK , 2008 .
[50] Ming Li,et al. Minimum description length induction, Bayesianism, and Kolmogorov complexity , 1999, IEEE Trans. Inf. Theory.
[51] Geoffrey E. Hinton,et al. Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.
[52] Peter Harremoes,et al. Finiteness of redundancy, regret, Shtarkov sums, and Jeffreys integrals in exponential families , 2009, 2009 IEEE International Symposium on Information Theory.
[53] Jorma Rissanen,et al. The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.
[54] Teemu Roos,et al. Robust Sequential Prediction in Linear Regression with Student's t-distribution , 2016, ISAIM.
[55] Kenji Yamanishi,et al. High-dimensional penalty selection via minimum description length principle , 2018, Machine Learning.
[56] I. J. Myung,et al. Counting probability distributions: Differential geometry and model selection , 2000, Proc. Natl. Acad. Sci. USA.
[57] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.
[58] Sumio Watanabe,et al. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory , 2010, J. Mach. Learn. Res..
[59] Andrew R. Barron,et al. Mixture Density Estimation , 1999, NIPS.
[60] Tom Sterkenburg. Universal Prediction: A Philosophical Investigation , 2018 .
[61] Carl E. Rasmussen,et al. Occam's Razor , 2000, NIPS.
[62] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[63] J. Berger,et al. Unified Conditional Frequentist and Bayesian Testing of Composite Hypotheses , 2003 .
[64] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[65] Tong Zhang. From ɛ-entropy to KL-entropy: Analysis of minimum information complexity density estimation , 2006, math/0702653.
[66] Tomi Silander,et al. Learning locally minimax optimal Bayesian networks , 2010, Int. J. Approx. Reason..
[67] Ivo Grosse,et al. Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data , 2015, BMC Bioinformatics.
[68] John Langford,et al. (Not) Bounding the True Error , 2001, NIPS.
[69] Manfred K. Warmuth,et al. The Last-Step Minimax Algorithm , 2000, ALT.
[70] Fumiyasu Komaki,et al. Relations Between the Conditional Normalized Maximum Likelihood Distributions and the Latent Information Priors , 2016, IEEE Transactions on Information Theory.
[71] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[72] Teemu Roos. Monte Carlo estimation of minimax regret with an application to MDL model selection , 2008, 2008 IEEE Information Theory Workshop.
[73] Petri Myllymäki,et al. A linear-time algorithm for computing the multinomial stochastic complexity , 2007, Inf. Process. Lett..
[74] Tong Zhang,et al. Information-theoretic upper and lower bounds for statistical estimation , 2006, IEEE Transactions on Information Theory.
[75] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[76] Andrew R. Barron,et al. Improved MDL Estimators Using Local Exponential Family Bundles Applied to Mixture Families , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[77] Zou. On Model Selection , Bayesian Networks , and the Fisher Information Integral , 2022 .
[78] C. S. Wallace,et al. An Information Measure for Classification , 1968, Comput. J..
[79] Ryan P. Adams,et al. Compressibility and Generalization in Large-Scale Deep Learning , 2018, ArXiv.
[80] Wojciech Kotlowski,et al. Prequential plug-in codes that achieve optimal redundancy rates even if the model is wrong , 2010, 2010 IEEE International Symposium on Information Theory.
[81] Peter Grünwald,et al. A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity , 2017, ALT.
[82] Andrew R. Barron,et al. MDL Procedures with ` 1 Penalty and their Statistical Risk , 2008 .
[83] Atsushi Suzuki,et al. Structure Selection for Convolutive Non-negative Matrix Factorization Using Normalized Maximum Likelihood Coding , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).
[84] R. Bouckaert. Minimum Description Length Principle , 1994 .
[85] E. Wagenmakers,et al. Harold Jeffreys’s default Bayes factor hypothesis tests: Explanation, extension, and application in psychology , 2016 .
[86] David A. McAllester. PAC-Bayesian Stochastic Model Selection , 2003, Machine Learning.
[87] David Maxwell Chickering,et al. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.
[88] Rianne de Heide,et al. the safe-bayesian lasso , 2016 .
[89] Andrew R. Barron,et al. Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.
[90] Andrew R. Barron,et al. Information theoretic validity of penalized likelihood , 2014, 2014 IEEE International Symposium on Information Theory.
[91] Yuhong Yang. Can the Strengths of AIC and BIC Be Shared , 2005 .
[92] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.