Plug-in methods in classification

Ce manuscrit etudie plusieurs problemes de classification sous contraintes. Dans ce cadre de classification, notre objectif est de construire un algorithme qui a des performances aussi bonnes que la meilleure regle de classification ayant une propriete souhaitee. Fait interessant, les methodes de classification de type plug-in sont bien appropriees a cet effet. De plus, il est montre que, dans plusieurs configurations, ces regles de classification peuvent exploiter des donnees non etiquetees, c'est-a-dire qu'elles sont construites de maniere semi-supervisee. Le Chapitre 1 decrit deux cas particuliers de la classification binaire - la classification ou la mesure de performance est reliee au F-score, et la classification equitable. A ces deux problemes, des procedures semi-supervisees sont proposees. En particulier, dans le cas du F-score, il s'avere que cette methode est optimale au sens minimax sur une classe usuelle de distributions non-parametriques. Aussi, dans le cas de la classification equitable, la methode proposee est consistante en terme de risque de classification, tout en satisfaisant asymptotiquement la contrainte d’egalite des chances. De plus, la procedure proposee dans ce cadre d'etude surpasse en pratique les algorithmes de pointe. Le Chapitre 3 decrit le cadre de la classification multi-classes par le biais d'ensembles de confiance. La encore, une procedure semi-supervisee est proposee et son optimalite presque minimax est etablie. Il est en outre etabli qu'aucun algorithme supervise ne peut atteindre une vitesse de convergence dite rapide. Le Chapitre 4 decrit un cas de classification multi-labels dans lequel on cherche a minimiser le taux de faux-negatifs sous reserve de contraintes de type presque sures sur les regles de classification. Dans cette partie, deux contraintes specifiques sont prises en compte: les classifieurs parcimonieux et ceux soumis a un controle des erreurs negatives a tort. Pour les premiers, un algorithme supervise est fourni et il est montre que cet algorithme peut atteindre une vitesse de convergence rapide. Enfin, pour la seconde famille, il est montre que des hypotheses supplementaires sont necessaires pour obtenir des garanties theoriques sur le risque de classification

[1]  E. Gilbert A comparison of signalling alphabets , 1952 .

[2]  J. Kiefer,et al.  Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator , 1956 .

[3]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[4]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[5]  R. Dudley The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , 1967 .

[6]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[7]  S. S. Vallender Calculation of the Wasserstein Distance Between Probability Distributions on the Line , 1974 .

[8]  Samuel D. Conte,et al.  Elementary Numerical Analysis: An Algorithmic Approach , 1975 .

[9]  D. Anbar A Modified Robbins-Monro Procedure Approximating the Zero of a Regression Function from Below , 1977 .

[10]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[11]  Luc Devroye,et al.  The uniform convergence of nearest neighbor regression function estimators and their application in optimization , 1978, IEEE Trans. Inf. Theory.

[12]  V. Sudakov,et al.  Geometric Problems in the Theory of Infinite-dimensional Probability Distributions , 1979 .

[13]  B. D. Finetti,et al.  Probability, induction and statistics : the art of guessing , 1979 .

[14]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[15]  Luc Devroye,et al.  Any Discrimination Rule Can Have an Arbitrarily Bad Probability of Error for Finite Sample Size , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  William M. Shaw,et al.  On the foundation of evaluation , 1986, J. Am. Soc. Inf. Sci..

[17]  J. Hartigan Estimation of a Convex Density Contour in Two Dimensions , 1987 .

[18]  Jacques Simon,et al.  Sobolev, Besov and Nikolskii fractional spaces: Imbeddings and comparisons for vector valued spaces on an interval , 1990 .

[19]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[20]  O. Lepskii Asymptotic Minimax Estimation with Prescribed Properties , 1990 .

[21]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[22]  S. L. Sobolev,et al.  Some Applications of Functional Analysis in Mathematical Physics , 1991 .

[23]  W. Polonik Measuring Mass Concentrations and Estimating Density Contour Clusters-An Excess Mass Approach , 1995 .

[24]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[25]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[26]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[27]  L. Brown,et al.  A constrained risk inequality with applications to nonparametric functional estimation , 1996 .

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[30]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[31]  H. Altay Güvenir,et al.  Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals , 1998, Artif. Intell. Medicine.

[32]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[33]  Francisco Cribari-Neto,et al.  A Note on Inverse Moments of Binomial Variates , 2000 .

[34]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[35]  Anthony F. Heath,et al.  Equality of Opportunity , 2017 .

[36]  J. Polzehl,et al.  Structure adaptive approach for dimension reduction , 2001 .

[37]  A. Juditsky,et al.  Direct estimation of the index coefficient in a single-index model , 2001 .

[38]  Vladimir Vovk,et al.  On-line confidence machines are well-calibrated , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[39]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[40]  Vladimir Vovk,et al.  Asymptotic Optimality of Transductive Confidence Machine , 2002, ALT.

[41]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[42]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[43]  L. Breiman CONSISTENCY FOR A SIMPLE MODEL OF RANDOM FORESTS , 2004 .

[44]  Jean-Yves Audibert Aggregated estimators and empirical complexity for least square regression , 2004 .

[45]  Chin-Hui Lee,et al.  A MFoM learning approach to robust multiclass multi-label text categorization , 2004, ICML.

[46]  S. Geer,et al.  Square root penalty: Adaptation to the margin in classification and in edge estimation , 2005, math/0507422.

[47]  L. Birge,et al.  A new lower bound for multiple hypothesis testing , 2005, IEEE Transactions on Information Theory.

[48]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[49]  S. Sathiya Keerthi,et al.  An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models , 2006, NIPS.

[50]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[51]  Radu Herbei,et al.  Classification with reject option , 2006 .

[52]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[53]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[54]  David Eisenstat,et al.  The VC dimension of k-fold union , 2007, Inf. Process. Lett..

[55]  Philippe Rigollet,et al.  Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..

[56]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[57]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[58]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[59]  Yves Grandvalet,et al.  Support Vector Machines with a Reject Option , 2008, NIPS.

[60]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[61]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[62]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[63]  Toon Calders,et al.  Building Classifiers with Independency Constraints , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[64]  Jean-Yves Audibert Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[65]  John Langford,et al.  Conditional Probability Tree Estimation Analysis and Algorithms , 2009, UAI.

[66]  P. Rigollet,et al.  Optimal rates for plug-in estimators of density level sets , 2006, math/0611473.

[67]  Toon Calders,et al.  Classifying without discriminating , 2009, 2009 2nd International Conference on Computer, Control and Communication.

[68]  Anatoli B. Juditsky,et al.  NONPARAMETRIC ESTIMATION OF COMPOSITE FUNCTIONS , 2009, 0906.0865.

[69]  Married,et al.  Classification with no discrimination by preferential sampling , 2010 .

[70]  Ohad Shamir,et al.  Multiclass-Multilabel Classification with More Classes than Examples , 2010, AISTATS.

[71]  Ohad Shamir,et al.  Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..

[72]  Toon Calders,et al.  Data preprocessing techniques for classification without discrimination , 2011, Knowledge and Information Systems.

[73]  R. Cooke Real and Complex Analysis , 2011 .

[74]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[75]  Zhi-Hua Zhou,et al.  On the Consistency of Multi-Label Learning , 2011, COLT.

[76]  Robin Genuer,et al.  Variance reduction in purely random forests , 2012 .

[77]  Nan Ye,et al.  Optimizing F-measure: A Tale of Two Approaches , 2012, ICML.

[78]  David Gil Méndez,et al.  Predicting seminal quality with artificial intelligence methods , 2012, Expert Syst. Appl..

[79]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[80]  Eyke Hüllermeier,et al.  Optimizing the F-Measure in Multi-Label Classification: Plug-in Rule Approach versus Structured Loss Minimization , 2013, ICML.

[81]  Toniann Pitassi,et al.  Learning Fair Representations , 2013, ICML.

[82]  Sanjay Chawla,et al.  On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance , 2013, ICML.

[83]  Narayanan Unny Edakunni,et al.  Beyond Fano's inequality: bounds on the optimal F-score, BER, and cost-sensitive risk and their implications , 2013, J. Mach. Learn. Res..

[84]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[85]  Oluwasanmi Koyejo,et al.  Consistent Binary Classification with Generalized Performance Metrics , 2014, NIPS.

[86]  Sylvain Arlot,et al.  Analysis of purely random forests bias , 2014, ArXiv.

[87]  Jing Lei Classification with confidence , 2014 .

[88]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[89]  Inderjit S. Dhillon,et al.  Large-scale Multi-label Learning with Missing Labels , 2013, ICML.

[90]  Harikrishna Narasimhan,et al.  On the Statistical Consistency of Plug-in Classifiers for Non-decomposable Performance Measures , 2014, NIPS.

[91]  Xin Li,et al.  Multi-label Image Classification with A Probabilistic Label Enhancement Model , 2014, UAI.

[92]  Carlos Eduardo Scheidegger,et al.  Certifying and Removing Disparate Impact , 2014, KDD.

[93]  Oluwasanmi Koyejo,et al.  Consistent Multilabel Classification , 2015, NIPS.

[94]  Christophe Denis,et al.  Confidence Sets for Classification , 2015, SLDS.

[95]  Bernt Schiele,et al.  Top-k Multiclass SVM , 2015, NIPS.

[96]  R. Nickl,et al.  Mathematical Foundations of Infinite-Dimensional Statistical Models , 2015 .

[97]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[98]  Prateek Jain,et al.  Sparse Local Embeddings for Extreme Multi-label Classification , 2015, NIPS.

[99]  Pradeep Ravikumar,et al.  Fast Classification Rates for High-dimensional Gaussian Generative Models , 2015, NIPS.

[100]  Georgios Paliouras,et al.  LSHTC: A Benchmark for Large-Scale Text Classification , 2015, ArXiv.

[101]  Binh T. Nguyen,et al.  Learning from Non-iid Data: Fast Rates for the One-vs-All Multiclass Plug-in Classifiers , 2015, TAMC.

[102]  Indre Zliobaite,et al.  On the relation between accuracy and fairness in binary classification , 2015, ArXiv.

[103]  Andrew D. Selbst,et al.  Big Data's Disparate Impact , 2016 .

[104]  Anderson Ara,et al.  Classification methods applied to credit scoring: A systematic review and overall comparison , 2016, 1602.02137.

[105]  Lalana Kagal,et al.  Iterative Orthogonal Feature Projection for Diagnosing Bias in Black-Box Models , 2016, ArXiv.

[106]  Kristian Lum,et al.  A statistical framework for fair predictive algorithms , 2016, ArXiv.

[107]  Aaron Roth,et al.  Fair Learning in Markovian Environments , 2016, ArXiv.

[108]  A. Dalalyan,et al.  On the prediction loss of the lasso in the partially labeled setting , 2016, 1606.06179.

[109]  Manik Varma,et al.  Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications , 2016, KDD.

[110]  Aaron Roth,et al.  Fairness in Learning: Classic and Contextual Bandits , 2016, NIPS.

[111]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[112]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[113]  Kush R. Varshney,et al.  Optimized Pre-Processing for Discrimination Prevention , 2017, NIPS.

[114]  Silvio Lattanzi,et al.  Fair Clustering Through Fairlets , 2018, NIPS.

[115]  Johannes Schmidt-Hieber,et al.  Nonparametric regression using deep neural networks with ReLU activation function , 2017, The Annals of Statistics.

[116]  Zhe Zhao,et al.  Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations , 2017, ArXiv.

[117]  Christophe Denis,et al.  Confidence Sets with Expected Sizes for Multiclass Classification , 2016, J. Mach. Learn. Res..

[118]  Bert Huang,et al.  Beyond Parity: Fairness Objectives for Collaborative Filtering , 2017, NIPS.

[119]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[120]  Bernhard Schölkopf,et al.  DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification , 2016, WSDM.

[121]  B. D. Finetti,et al.  Theory of Probability: A Critical Introductory Treatment , 2017 .

[122]  Heinrich Jiang,et al.  Uniform Convergence Rates for Kernel Density Estimation , 2017, ICML.

[123]  Yale Song,et al.  Improving Pairwise Ranking for Multi-label Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[124]  Oluwasanmi Koyejo,et al.  Consistency Analysis for Binary Classification Revisited , 2017, ICML.

[125]  Bernhard Schölkopf,et al.  Avoiding Discrimination through Causal Reasoning , 2017, NIPS.

[126]  Jon M. Kleinberg,et al.  On Fairness and Calibration , 2017, NIPS.

[127]  Evgenii Chzhen,et al.  On the benefits of output sparsity for multi-label classification , 2017, ArXiv.

[128]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[129]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[130]  Adam Tauman Kalai,et al.  Decoupled Classifiers for Group-Fair and Efficient Machine Learning , 2017, FAT.

[131]  Manik Varma,et al.  Extreme Multi-label Learning with Label Features for Warm-start Tagging, Ranking & Recommendation , 2018, WSDM.

[132]  Ambuj Tewari,et al.  Consistent algorithms for multiclass classification with an abstain option , 2018 .

[133]  Aditya Krishna Menon,et al.  The cost of fairness in binary classification , 2018, FAT.

[134]  Larry A. Wasserman,et al.  Least Ambiguous Set-Valued Classifiers With Bounded Error Levels , 2016, Journal of the American Statistical Association.

[135]  Oluwasanmi Koyejo,et al.  Binary Classification with Karmic, Threshold-Quasi-Concave Metrics , 2018, ICML.

[136]  Evgenii Chzhen Optimal rates for F-score binary classification , 2019, 1905.04039.

[137]  Mohamed Hebiri,et al.  On Lasso refitting strategies , 2017, Bernoulli.

[138]  Evgenii Chzhen,et al.  Minimax semi-supervised confidence sets for multi-class classification , 2019, 1904.12527.

[139]  Maya R. Gupta,et al.  Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints , 2018, ICML.

[140]  Krishna P. Gummadi,et al.  Fairness Constraints: A Flexible Approach for Fair Classification , 2019, J. Mach. Learn. Res..

[141]  Evgenii Chzhen,et al.  Classification of sparse binary vectors , 2019, 1903.11867.

[142]  Luca Oneto,et al.  Taking Advantage of Multitask Learning for Fair Classification , 2018, AIES.

[143]  Luca Oneto,et al.  Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification , 2019, NeurIPS.

[144]  Luca Oneto,et al.  General Fair Empirical Risk Minimization , 2019, 2020 International Joint Conference on Neural Networks (IJCNN).