Beyond Fano's inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications
暂无分享,去创建一个
Narayanan Unny Edakunni | Gavin Brown | Adam Craig Pocock | Ming-Jie Zhao | Gavin Brown | N. Edakunni | Ming-Jie Zhao
[1] Gavin Brown,et al. Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..
[2] Deniz Erdogmus,et al. Lower and Upper Bounds for Misclassification Probability Based on Renyi's Information , 2004, J. VLSI Signal Process..
[3] Kim C. Border,et al. Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .
[4] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.
[5] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[6] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[7] Rich Caruana,et al. Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.
[8] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[9] Jovan Dj. Golic,et al. On the relationship between the information measures and the Bayes probability of error , 1987, IEEE Trans. Inf. Theory.
[10] G. A. Barnard,et al. Transmission of Information: A Statistical Theory of Communications. , 1961 .
[11] Martin Jansche,et al. A Maximum Expected Utility Framework for Binary Sequence Labeling , 2007, ACL.
[12] T. Kailath. The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .
[13] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .
[14] Kari Torkkola,et al. Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..
[15] Inder Jeet Taneja,et al. On Generalized Information Measures and Their Applications , 1989 .
[16] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..
[17] Ralph Linsker,et al. An Application of the Principle of Maximum Information Preservation to Linear Systems , 1988, NIPS.
[18] G. Crooks. On Measures of Entropy and Information , 2015 .
[19] David G. Stork,et al. Pattern Classification , 1973 .
[20] Moshe Ben-Bassat,et al. f-Entropies, probability of Error, and Feature Selection , 1978, Inf. Control..
[21] J. Príncipe,et al. Information-Theoretic Learning Using Renyi's Quadratic Entropy , 1999 .
[22] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..
[23] Ambuj Tewari,et al. On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..
[24] Deniz Erdogmus,et al. Feature extraction using information-theoretic learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[25] David D. Lewis,et al. Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.
[26] Ingo Steinwart. How to Compare Different Loss Functions and Their Risks , 2007 .
[27] Son Lam Phung,et al. Learning Pattern Classification Tasks with Imbalanced Data Sets , 2009 .
[28] Shigeo Abe DrEng. Pattern Classification , 2001, Springer London.
[29] Dan Roth,et al. Understanding Probabilistic Classifiers , 2001, ECML.
[30] Charles Elkan,et al. The Foundations of Cost-Sensitive Learning , 2001, IJCAI.
[31] Samuel J. Dwyer,et al. Uncertainty and the probability of error (Corresp.) , 1968, IEEE Trans. Inf. Theory.
[32] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..
[33] Martin E. Hellman,et al. Probability of error, equivocation, and the Chernoff bound , 1970, IEEE Trans. Inf. Theory.
[34] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[35] C. E. SHANNON,et al. A mathematical theory of communication , 1948, MOCO.
[36] Ralph Linsker,et al. Towards an Organizing Principle for a Layered Perceptual Network , 1987, NIPS.
[37] Neri Merhav,et al. Relations between entropy and error probability , 1994, IEEE Trans. Inf. Theory.