Efficient Optimization for Sparse Gaussian Process Regression

K is the rank n full covariance matrix to be factorized, and K does not need to precomputed (taking up O(n) storage), but just need to return its diagonal and specific column when queried (a function handle for example). If σ is supplied, the algorithm below operates with an additional twist allowing the augmentation trick introduced in Sec. 3 of the paper, in which case the matrix L in the algorithm is the augmented version L̃mentioned in the paper; and L[1 : n, :] is the un-augmented portion. Q,R are the QR factorization of L. The procedure also returns two vectors, p and d. p is a permutation of (1, 2, . . . , n), and d stores the diagonal values of the residual matrix between the full K and current partial Cholesky factorization. In our application to kernel covariance matrix, d is also the point-wise variance that is not yet explained by (the factorization using) existing inducing points. See post-conditions after the algorithm for formal relationships among various quantities.