Information, Divergence and Risk for Binary Experiments

We unify f-divergences, Bregman divergences, surrogate regret bounds, proper scoring rules, cost curves, ROC-curves and statistical information. We do this by systematically studying integral and variational representations of these objects and in so doing identify their representation primitives which all are related to cost-sensitive binary classification. As well as developing relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate regret bounds and generalised Pinsker inequalities relating f-divergences to variational divergence. The new viewpoint also illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates maximum mean discrepancy to Fisher linear discriminants.

[1]  J. Gilluly,et al.  Principles of Geology , 1969 .

[2]  Julia Miller,et al.  The Principles of Geology , 1905, Nature.

[3]  W. James A Pluralistic Universe , 1909 .

[4]  J. Locke An Essay concerning Human Understanding , 1924, Nature.

[5]  E. S. Pearson,et al.  On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .

[6]  Abraham Wald,et al.  Statistical Decision Functions , 1951 .

[7]  D. Blackwell Comparison of Experiments , 1951 .

[8]  I. Berlin The Hedgehog and the Fox: An essay on Tolstoy's View of History , 1955 .

[9]  D. Blackwell Equivalent Comparisons of Experiments , 1953 .

[10]  I. Berlin,et al.  The Hedgehog and the Fox: An essay on Tolstoy's View of History , 1954 .

[11]  W. B. Temple Stieltjes integral representation of convex functions , 1954 .

[12]  G. Choquet Theory of capacities , 1954 .

[13]  Patrick Gardiner,et al.  The Hedgehog and the Fox: an Essay on Tolstoy's view of History . By Isaiah Berlin. (Weidenfeld and Nicolson. Price 8s. 6d.) , 1955, Philosophy.

[14]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[15]  H. D. Brunk,et al.  Minimizing integrals in certain classes of monotone functions. , 1957 .

[16]  A. Birnbaum ON THE FOUNDATIONS OF STATISTICAL INFERENCE: BINARY EXPERIMENTS' , 1961 .

[17]  M. Degroot Uncertainty, Information, and Sequential Experiments , 1962 .

[18]  Amiel Feinstein,et al.  Information and information stability of random variables and processes , 1964 .

[19]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[20]  L. L. Cam,et al.  Sufficiency and Approximate Sufficiency , 1964 .

[21]  R. Phelps Lectures on Choquet's Theorem , 1966 .

[22]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[23]  E. H. Shuford,et al.  Admissible probability measurement procedures , 1966, Psychometrika.

[24]  Viktor Mikhaĭlovich Glushkov,et al.  An Introduction to Cybernetics , 1957, The Mathematical Gazette.

[25]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[26]  R. Sacksteder A Note on Statistical Equivalence , 1967 .

[27]  S. Kullback,et al.  A lower bound for discrimination information in terms of variation (Corresp.) , 1967, IEEE Trans. Inf. Theory.

[28]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[29]  Solomon Kullback,et al.  Correction to A Lower Bound for Discrimination Information in Terms of Variation , 1970, IEEE Trans. Inf. Theory.

[30]  Igor Vajda,et al.  Note on discrimination information and variation (Corresp.) , 1970, IEEE Trans. Inf. Theory.

[31]  M. Degroot Optimal Statistical Decisions , 1970 .

[32]  L. J. Savage Elicitation of Personal Probabilities and Expectations , 1971 .

[33]  J. Mikusiński,et al.  Theory of distributions : the sequential approach , 1973 .

[34]  C. D. Litton,et al.  Comparative Statistical Inference. , 1975 .

[35]  H. V. Poor,et al.  Applications of Ali-Silvey Distance Measures in the Design of Generalized Quantizers for Binary Decision Systems , 1977, IEEE Trans. Commun..

[36]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[37]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[38]  G. Toussaint Probability of error, expected divergence, and the affinity of several distributions , 1978 .

[39]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[40]  Moshe Ben-Bassat,et al.  Epsilon -equivalence of feature selection rules (Corresp.) , 1978, IEEE Trans. Inf. Theory.

[41]  M. Degroot,et al.  Comparison of Experiments and Information Measures , 1979 .

[42]  Erik N. Torgersen,et al.  Measures of Information Based on Comparison with Total Information and with Total Ignorance , 1981 .

[43]  R. Iman,et al.  Rank Transformations as a Bridge between Parametric and Nonparametric Statistics , 1981 .

[44]  F. Österreicher,et al.  Divergenzen von Wahrscheinlichkeitsverteilungen — Integralgeometrisch Betrachtet , 1981 .

[45]  Herbert Heyer Theory of statistical experiments , 1982 .

[46]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[47]  J. Dieudonne History of functional analysis , 1983 .

[48]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[49]  J. Aczél Measuring information beyond communication theory—Why some generalized information measures may be useful, others not , 1984 .

[50]  Gerald J. Sussman,et al.  Structure and interpretation of computer programs , 1985, Proceedings of the IEEE.

[51]  James V. Bondar,et al.  Mathematical theory of statistics , 1985 .

[52]  L. L. Cam,et al.  Asymptotic methods in statistical theory , 1986 .

[53]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[54]  Jack Carl Kiefer,et al.  Lectures on statistical inference , 1986 .

[55]  J. Kiefer Introduction to statistical inference , 1987 .

[56]  Jovan Dj. Golic,et al.  On the relationship between the information measures and the Bayes probability of error , 1987, IEEE Trans. Inf. Theory.

[57]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[58]  R. M. Dudley,et al.  Real Analysis and Probability , 1989 .

[59]  Vladimir Vapnik,et al.  Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures) , 1989, COLT '89.

[60]  M. Schervish A General Method for Comparing Probability Assessors , 1989 .

[61]  Joel Mokyr,et al.  The Lever of Riches: Technological Creativity and Economic Progress , 1991 .

[62]  Joel Mokyr,et al.  The Lever of Riches: Technological Creativity and Economic Progress. , 1991 .

[63]  Jim Freeman Probability Metrics and the Stability of Stochastic Models , 1991 .

[64]  J. Borwein,et al.  Duality relationships for entropy-like minimization problems , 1991 .

[65]  S. Turkle,et al.  Epistemological Pluralism and the Revaluation of the Concrete. , 1992 .

[66]  Donald E. Knuth Two notes on notation , 1992 .

[67]  Ronald L. Rivest,et al.  Learning Binary Relations and Total Orders , 1993, SIAM J. Comput..

[68]  Jstor Journal of the Royal Statistical Society. Series D, (The statistician) , 1993 .

[69]  Ferdinand Österreicher,et al.  Statistical information and discrimination , 1993, IEEE Trans. Inf. Theory.

[70]  R. Estrada,et al.  Introduction to the Theory of Distributions , 1994 .

[71]  David J. Hand,et al.  Deconstructing Statistical Questions , 1994 .

[72]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[73]  I. Csiszár Generalized projections for non-negative functions , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[74]  Octavia I. Camps,et al.  Weighted Parzen Windows for Pattern Classification , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[75]  L. Brown,et al.  Asymptotic equivalence of nonparametric regression and white noise , 1996 .

[76]  C. Robert The Bayesian choice : a decision-theoretic motivation , 1996 .

[77]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[78]  A. H. Murphy,et al.  Scoring rules and the evaluation of probabilities , 1996 .

[79]  A. Müller Stochastic Orders Generated by Integrals: a Unified Study , 1997, Advances in Applied Probability.

[80]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[81]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[82]  A. Cuevas,et al.  A plug-in approach to support estimation , 1997 .

[83]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[84]  A. Müller Integral Probability Metrics and Their Generating Classes of Functions , 1997, Advances in Applied Probability.

[85]  Manuel Cepedello Boiso On regularization in superreflexive Banach spaces by infimal convolution formulas , 1998 .

[86]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[87]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[88]  Grace L. Yang A conversation with Lucien Le Cam , 1999 .

[89]  Lang P. Withers,et al.  Some inequalities relating different measures of divergence between two probability distributions , 1999, IEEE Trans. Inf. Theory.

[90]  David J. Crisp,et al.  A Geometric Interpretation of v-SVM Classifiers , 1999, NIPS.

[91]  Susan Leigh Star,et al.  Sorting Things Out: Classification and Its Consequences , 1999 .

[92]  David J. Crisp,et al.  A Geometric Interpretation of ?-SVM Classifiers , 1999, NIPS 2000.

[93]  Manfred K. Warmuth,et al.  Relative loss bounds for single neurons , 1999, IEEE Trans. Neural Networks.

[94]  Daphne Koller,et al.  Restricted Bayes Optimal Classifiers , 2000, AAAI/IAAI.

[95]  F. Lecky,et al.  An introduction to statistical inference—3 , 2000, Journal of accident & emergency medicine.

[96]  Flemming Topsøe,et al.  Some inequalities for information divergence and related measures of discrimination , 2000, IEEE Trans. Inf. Theory.

[97]  A. Arnold,et al.  On generalized Csiszár-Kullback inequalities , 2000 .

[98]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[99]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[100]  John W. Fisher,et al.  Learning from Examples with Information Theoretic Criteria , 2000, J. VLSI Signal Process..

[101]  Josef A. Mazanec,et al.  Reduction of Complexity , 2000 .

[102]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[103]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[104]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[105]  Wayne E. Stark,et al.  Unified design of iterative receivers using factor graphs , 2001, IEEE Trans. Inf. Theory.

[106]  Paul W. Goldberg When Can Two Unsupervised Learners Achieve PAC Separation? , 2001, COLT/EuroCOLT.

[107]  D. Tasche Conditional Expectation as Quantile Derivative , 2001, math/0104190.

[108]  L. Brown,et al.  Direct asymptotic equivalence of nonparametric regression and the infinite dimensional location problem , 2001 .

[109]  Shinto Eguchi,et al.  Recent developments in discriminant analysis from an information geometric point of view , 2001 .

[110]  F. Topsøe BOUNDS FOR ENTROPY AND DIVERGENCE FOR DISTRIBUTIONS OVER A TWO-ELEMENT SET , 2001 .

[111]  Dietrich Wettschereck,et al.  Exchanging Data Mining Models with the Predictive Modelling Markup Language , 2001 .

[112]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[113]  S. Dragomir,et al.  Csiszár f-divergence, Ostrowski’s inequality and mutual information , 2001 .

[114]  J. Overhage,et al.  Sorting Things Out: Classification and Its Consequences , 2001, Annals of Internal Medicine.

[115]  G. Shafer,et al.  Probability and Finance: It's Only a Game! , 2001 .

[116]  H. Strasser,et al.  A Nonparametric Approach to Perceptions-Based Market Segmentation: Foundations , 2001 .

[117]  Ali H. Sayed,et al.  A survey of spectral factorization methods , 2001, Numer. Linear Algebra Appl..

[118]  A. Doucet,et al.  A survey of convergence results on particle ltering for practitioners , 2002 .

[119]  Arnaud Doucet,et al.  A survey of convergence results on particle filtering methods for practitioners , 2002, IEEE Trans. Signal Process..

[120]  A. Vaart The statistical work of Lucien Le Cam , 2002 .

[121]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[122]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[123]  Cun-Hui Zhang,et al.  Asymptotic equivalence theory for nonparametric regression with random design , 2002 .

[124]  W. Loh,et al.  Nonparametric estimation of conditional quantiles using quantile regression trees ∗ ( Published in Bernoulli ( 2002 ) , 8 , 561 – 576 ) , 2008 .

[125]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[126]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[127]  Andrew V. Carter DEFICIENCY DISTANCE BETWEEN MULTINOMIAL AND MULTIVARIATE NORMAL EXPERIMENTS , 2002 .

[128]  Shigeru Katagiri,et al.  Classification error from the theoretical Bayes classification risk , 2002, INTERSPEECH.

[129]  C. Priebe,et al.  A weighted generalization of the Mann-Whitney-Wilcoxon statistic , 2002 .

[130]  Robert L. Grossman,et al.  Data mining standards initiatives , 2002, CACM.

[131]  A. Keziou Dual representation of Φ-divergences and applications , 2003 .

[132]  John K. Gershenson,et al.  Product modularity: Definitions and benefits , 2003 .

[133]  J. Ginebra,et al.  When is one experiment ‘always better than’ another? , 2003 .

[134]  Hal R. Varian,et al.  Innovation, Components and Complements , 2003 .

[135]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[136]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[137]  Amor Keziou Utilisation des Divergences entre Mesures en Statistique Inférentielle , 2003 .

[138]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[139]  Peter A. Flach The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.

[140]  J. Berger Could Fisher, Jeffreys and Neyman Have Agreed on Testing? , 2003 .

[141]  Shigeru Katagiri,et al.  A new formalization of minimum classification error using a Parzen estimate of classification chance , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[142]  F. Österreicher f-DIVERGENCES-REPRESENTATION THEOREM AND METRIZABILITY , 2003 .

[143]  Andreas Blass,et al.  Algorithms: A Quest for Absolute Definitions , 2003, Bull. EATCS.

[144]  Peter Harremoës,et al.  Refinements of Pinsker's inequality , 2003, IEEE Trans. Inf. Theory.

[145]  Christopher K. I. Williams Learning Kernel Classifiers , 2003 .

[146]  Manfred K. Warmuth,et al.  Relative Loss Bounds for Multidimensional Regression Problems , 1997, Machine Learning.

[147]  L. Floridi OPEN PROBLEMS IN THE PHILOSOPHY OF INFORMATION , 2004 .

[148]  James O. Berger,et al.  The interplay of Bayesian and frequentist analysis , 2004 .

[149]  Inder Jeet Taneja Bounds on Non - Symmetric Divergence Measures in terms of Symmetric Divergence Measures - communica , 2004 .

[150]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[151]  M. Broniatowski Minimum divergence in inference and testing , 2004 .

[152]  J. Kiefer The foundations of statistics—Are there any? , 1977, Synthese.

[153]  R. A. Maxion,et al.  Proper Use of ROC Curves in Intrusion/Anomaly Detection , 2004 .

[154]  Deniz Erdoğmuş,et al.  Towards a unification of information theoretic learning and kernel methods , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[155]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[156]  Don R. Hush,et al.  Density Level Detection is Classification , 2004, NIPS.

[157]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[158]  Jun Zhang,et al.  Divergence Function, Duality, and Convex Analysis , 2004, Neural Computation.

[159]  S. Eguchi Information Geometry and Statistical Pattern Recognition , 2004 .

[160]  R. .. Roberts Repairing Concavities in ROC Curves , 2004 .

[161]  J. Hiriart-Urruty,et al.  Fundamentals of Convex Analysis , 2004 .

[162]  Martin J. Wainwright,et al.  ON surrogate loss functions and f-divergences , 2005, math/0510521.

[163]  Michael I. Jordan,et al.  On distance measures, surrogate loss functions, and distributed detection , 2005 .

[164]  Thomas P. Hayes,et al.  Error limiting reductions between classification tasks , 2005, ICML.

[165]  Yi Shen,et al.  Loss functions for binary classification and class probability estimation , 2005 .

[166]  I. J. Taneja REFINEMENT INEQUALITIES AMONG SYMMETRIC DIVERGENCE MEASURES , 2005 .

[167]  Alexander J. Smola,et al.  Kernel methods and the exponential family , 2006, ESANN.

[168]  Xin Guo,et al.  On the optimality of conditional expectation as a Bregman predictor , 2005, IEEE Trans. Inf. Theory.

[169]  Sarita Albagli,et al.  Memory Practices in the Sciences , 2008 .

[170]  Robert Jenssen,et al.  An information-theoretic perspective to kernel independent components analysis , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[171]  T. Minka Discriminative models, not discriminative training , 2005 .

[172]  Inder Jeet Taneja,et al.  Inequalities Among Symmetric Divergence Measures and Their Refinement , 2005 .

[173]  David S. Johnson,et al.  The NP-completeness column , 2005, TALG.

[174]  C. Villani,et al.  Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities , 2005 .

[175]  John Langford,et al.  Estimating Class Membership Probabilities using Classifier Learners , 2005, AISTATS.

[176]  R. Lutz,et al.  An Alternative Mathematical Foundation for Statistics , 2005 .

[177]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[178]  John Langford,et al.  Sensitive Error Correcting Output Codes , 2005, COLT.

[179]  Pranesh Kumar,et al.  A symmetric information divergence measure of the Csiszár's f-divergence class and its bounds , 2005 .

[180]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[181]  Jorma Rissanen,et al.  Information and Complexity in Statistical Modeling , 2006, ITW.

[182]  Rocco A. Servedio,et al.  Discriminative learning can succeed where generative learning fails , 2006, Inf. Process. Lett..

[183]  Eric Horvitz,et al.  Considering Cost Asymmetry in Learning Classifiers , 2006, J. Mach. Learn. Res..

[184]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[185]  Chris Drummond,et al.  Discriminative vs. Generative Classifiers for Cost Sensitive Learning , 2006, Canadian Conference on AI.

[186]  Paul W. Goldberg,et al.  PAC Classification based on PAC Estimates of Label Class Distributions , 2006, ArXiv.

[187]  John Langford,et al.  Predicting Conditional Quantiles via Reduction to Classification , 2006, UAI.

[188]  Gustavo L. Gilardoni On the minimum f-divergence for given total variation , 2006 .

[189]  Kim B. Clark,et al.  Between "Knowledge" and "the Economy": Notes on the Scientific Study of Designs , 2006 .

[190]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[191]  Carliss Y. Baldwin,et al.  Modularity in the Design of Complex Engineering Systems , 2006 .

[192]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[193]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Empirical Inference Science (Information Science and Statistics) , 2006 .

[194]  D. Noble Music of life : biology beyond the genome , 2006 .

[195]  Tong Zhang,et al.  Subset Ranking Using Regression , 2006, COLT.

[196]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[197]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[198]  Ling Li,et al.  Ordinal Regression by Extended Binary Classification , 2006, NIPS.

[199]  Robert Jenssen,et al.  Some Equivalences between Kernel Methods and Information Theoretic Methods , 2006, J. VLSI Signal Process..

[200]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[201]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[202]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[203]  F. Topsøe Between Truth and Description , 2006 .

[204]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[205]  Tom Fawcett,et al.  ROC graphs with instance-varying costs , 2006, Pattern Recognit. Lett..

[206]  José Carlos Príncipe,et al.  Kernel Principal Components Are Maximum Entropy Projections , 2006, ICA.

[207]  On Pinsker's Type Inequalities and Csiszar's f-divergences. Part I: Second and Fourth-Order Inequalities , 2006, ArXiv.

[208]  David Bawden,et al.  Memory Practices in the Sciences , 2007 .

[209]  Ross D King,et al.  An ontology of scientific experiments , 2006, Journal of The Royal Society Interface.

[210]  Alexander J. Smola,et al.  Unifying Divergence Minimization and Statistical Inference Via Convex Duality , 2006, COLT.

[211]  Brian Kahin,et al.  Between “Knowledge” and “The Economy”: Notes on the Scientific Study of Designs , 2006 .

[212]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[213]  Ben Taskar,et al.  Markov Logic: A Unifying Framework for Statistical Relational Learning , 2007 .

[214]  J. Hiriart-Urruty,et al.  Convex solutions of a functional equation arising in information theory , 2007 .

[215]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[216]  Ingo Steinwart How to Compare Different Loss Functions and Their Risks , 2007 .

[217]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[218]  A. Dawid The geometry of proper scoring rules , 2007 .

[219]  A. Beygelzimer Multiclass Classification with Filter Trees , 2007 .

[220]  T. Aaron Gulliver,et al.  Confliction of the Convexity and Metric Properties in f-Divergences , 2007, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[221]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[222]  Clayton D. Scott,et al.  Regression Level Set Estimation Via Cost-Sensitive Classification , 2007, IEEE Transactions on Signal Processing.

[223]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[224]  Jens Lindström,et al.  On the origin and early history of functional analysis , 2008 .

[225]  John Langford,et al.  Self-financed wagering mechanisms for forecasting , 2008, EC '08.

[226]  Thomas C.M. Lee,et al.  Information and Complexity in Statistical Modeling , 2008 .

[227]  Sunita Sarawagi Learning with Graphical Models , 2008 .

[228]  W. James A Pluralistic Universe: Hibbert Lectures at Manchester College on the Present Situation in Philosophy , 2010 .

[229]  Heinz H. Bauschke,et al.  The Proximal Average: Basic Theory , 2008, SIAM J. Optim..

[230]  John Langford,et al.  Machine Learning Techniques—Reductions Between Prediction Quality Metrics , 2008 .

[231]  Russell Zaretzki,et al.  The Skill Plot: A Graphical Technique for Evaluating Continuous Diagnostic Tests , 2007, Biometrics.

[232]  Le Song Discriminative Estimation of f-Divergence , 2008 .

[233]  Mark D. Reid,et al.  Surrogate regret bounds for proper losses , 2009, ICML '09.

[234]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[235]  Bharath K. Sriperumbudur,et al.  Ja n 20 09 A Note on Integral Probability Metrics and φ-divergences , 2009 .

[236]  Kenji Fukumizu,et al.  On integral probability metrics, φ-divergences and binary classification , 2009, 0901.2698.

[237]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[238]  Hiroshi Matsuzoe,et al.  Dualistic Riemannian Manifold Structure Induced from Convex Functions , 2009 .

[239]  Michel Broniatowski,et al.  Parametric estimation and tests through divergences and the duality technique , 2008, J. Multivar. Anal..

[240]  Nachum Dershowitz,et al.  When are Two Algorithms the Same? , 2008, The Bulletin of Symbolic Logic.

[241]  M. Weitzman,et al.  Recombinant Growth , 2009 .

[242]  Bernhard Schölkopf,et al.  A note on integral probability metrics and $\phi$-divergences , 2009, ArXiv.

[244]  Pedro M. Domingos 1 Markov Logic: A Unifying Framework for Statistical Relational Learning , 2010 .

[245]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[246]  Gustavo L. Gilardoni Corrigendum to the Note “On the minimum f -divergence for given total variation” [C. R. Acad. Sci. Paris, Ser. I 343 (2006) 763–766] , 2010 .

[247]  Mark D. Reid,et al.  Composite Binary Losses , 2009, J. Mach. Learn. Res..

[248]  Šarūnas Raudys,et al.  Statistical and Neural Classifiers: An Integrated Approach to Design , 2012 .

[249]  Frederik Herzberg,et al.  Radically Elementary Probability Theory , 2013 .