Federated Over-the-Air Subspace Learning from Incomplete Data

Federated learning refers to a distributed learning scenario in which users/nodes keep their data private but only share intermediate locally computed iterates with the master node. The master, in turn, shares a global aggregate of these iterates with all the nodes at each iteration. In this work, we consider a wireless federated learning scenario where the nodes communicate to and from the master node via a wireless channel. Current and upcoming technologies such as 5G (and beyond) will operate mostly in a non-orthogonal multiple access (NOMA) mode where transmissions from the users occupy the same bandwidth and interfere at the access point. These technologies naturally lend themselves to an "over-the-air" superposition whereby information received from the user nodes can be directly summed at the master node. However, over-the-air aggregation also means that the channel noise can corrupt the algorithm iterates at the time of aggregation at the master. This iteration noise introduces a novel set of challenges that have not been previously studied in the literature. It needs to be treated differently from the well-studied setting of noise or corruption in the dataset itself. In this work, we first study the subspace learning problem in a federated over-the-air setting. Subspace learning involves computing the subspace spanned by the top $r$ singular vectors of a given matrix. We develop a federated over-the-air version of the power method (FedPM) and show that its iterates converge as long as (i) the channel noise is very small compared to the $r$-th singular value of the matrix; and (ii) the ratio between its $(r+1)$-th and $r$-th singular value is smaller than a constant less than one. The second important contribution of this work is to show how over-the-air FedPM can be used to obtain a provably accurate federated solution for subspace tracking in the presence of missing data.

[1]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[2]  Zhi Ding,et al.  Federated Learning via Over-the-Air Computation , 2018, IEEE Transactions on Wireless Communications.

[3]  Prateek Jain,et al.  Nearly Optimal Robust Matrix Completion , 2016, ICML.

[4]  Maria-Florina Balcan,et al.  An Improved Gap-Dependency Analysis of the Noisy Power Method , 2016, COLT.

[5]  Namrata Vaswani,et al.  NEARLY OPTIMAL ROBUST SUBSPACE TRACKING: A UNIFIED APPROACH , 2017, 2018 IEEE Data Science Workshop (DSW).

[6]  Deniz Gündüz,et al.  Federated Learning Over Wireless Fading Channels , 2019, IEEE Transactions on Wireless Communications.

[7]  Namrata Vaswani,et al.  PCA in Sparse Data-Dependent Noise , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[8]  Yonina C. Eldar,et al.  Subspace Learning with Partial Information , 2014, J. Mach. Learn. Res..

[9]  Indranil Gupta,et al.  Fall of Empires: Breaking Byzantine-tolerant SGD by Inner Product Manipulation , 2019, UAI.

[10]  Moritz Hardt,et al.  The Noisy Power Method: A Meta Algorithm with Applications , 2013, NIPS.

[11]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[12]  Cecilia Mascolo,et al.  Federated PCA with Adaptive Rank Estimation , 2019, ArXiv.

[13]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[14]  David P. Woodruff,et al.  Improved Distributed Principal Component Analysis , 2014, NIPS.

[15]  Deniz Gündüz,et al.  Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[16]  Peter Richtárik,et al.  Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.

[17]  Dan Alistarh,et al.  Byzantine Stochastic Gradient Descent , 2018, NeurIPS.

[18]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[19]  Anass Benjebbour,et al.  Non-Orthogonal Multiple Access (NOMA) for Cellular Future Radio Access , 2013, 2013 IEEE 77th Vehicular Technology Conference (VTC Spring).

[20]  Dejiao Zhang,et al.  Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation , 2015, AISTATS.

[21]  A. Robert Calderbank,et al.  PETRELS: Parallel Subspace Estimation and Tracking by Recursive Least Squares From Partial Observations , 2012, IEEE Transactions on Signal Processing.

[22]  Kin K. Leung,et al.  Adaptive Federated Learning in Resource Constrained Edge Computing Systems , 2018, IEEE Journal on Selected Areas in Communications.

[23]  Qing Ling,et al.  Byzantine-Robust Stochastic Gradient Descent for Distributed Low-Rank Matrix Completion , 2019, 2019 IEEE Data Science Workshop (DSW).

[24]  Mihaela van der Schaar,et al.  Machine Learning in the Air , 2019, IEEE Journal on Selected Areas in Communications.

[25]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[26]  Benjamin Recht,et al.  A Simpler Approach to Matrix Completion , 2009, J. Mach. Learn. Res..

[27]  Qiang Yang,et al.  Federated Machine Learning , 2019, ACM Trans. Intell. Syst. Technol..

[28]  M. Rudelson,et al.  The smallest singular value of a random rectangular matrix , 2008, 0802.3956.

[29]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[30]  Namrata Vaswani,et al.  Provable Subspace Tracking From Missing Data and Matrix Completion , 2018, IEEE Transactions on Signal Processing.

[31]  Anna Scaglione,et al.  A Review of Distributed Algorithms for Principal Component Analysis , 2018, Proceedings of the IEEE.

[32]  Rainer Gemulla,et al.  Distributed Matrix Completion , 2012, 2012 IEEE 12th International Conference on Data Mining.