Federated Learning with Only Positive Labels

We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class. As a result, during each federated learning round, the users need to locally update the classifier without having access to the features and the model parameters for the negative classes. Thus, naively employing conventional decentralized learning such as the distributed SGD or Federated Averaging may lead to trivial or extremely poor classifiers. In particular, for the embedding based classifiers, all the class embeddings might collapse to a single point. To address this problem, we propose a generic framework for training with only positive labels, namely Federated Averaging with Spreadout (FedAwS), where the server imposes a geometric regularizer after each round to encourage classes to be spreadout in the embedding space. We show, both theoretically and empirically, that FedAwS can almost match the performance of conventional learning where users have access to negative labels. We further extend the proposed method to the settings with large output spaces.

[1]  Shih-Fu Chang,et al.  Learning Spread-Out Local Feature Descriptors , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Don R. Hush,et al.  Network constraints and multi-objective optimization for one-class classification , 1996, Neural Networks.

[4]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[5]  Anit Kumar Sahu,et al.  Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[6]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[7]  Jordi Vitrià,et al.  Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Sashank J. Reddi,et al.  Stochastic Negative Mining for Learning with Large Output Spaces , 2018, AISTATS.

[9]  Sanjiv Kumar,et al.  cpSGD: Communication-efficient and differentially-private distributed SGD , 2018, NeurIPS.

[10]  Nagarajan Natarajan,et al.  PU Learning for Matrix Completion , 2014, ICML.

[11]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[12]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[13]  Ali Mousavi,et al.  Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces , 2019, NeurIPS.

[14]  H. Brendan McMahan,et al.  Generative Models for Effective ML on Private, Decentralized Datasets , 2019, ICLR.

[15]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..

[16]  Ian Goodfellow,et al.  Deep Learning with Differential Privacy , 2016, CCS.

[17]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .

[24]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Sarvar Patel,et al.  Practical Secure Aggregation for Federated Learning on User-Held Data , 2016, ArXiv.

[27]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[28]  Thomas G. Dietterich,et al.  Error-Correcting Output Codes: A General Method for Improving Multiclass Inductive Learning Programs , 1991, AAAI.

[29]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.