Matrix Differential Calculus

Elementary matrix analysis is the staple of problems in machine learning, and data mining. Most often one is engaged with optimizing objective functions, and traditionally the development of methods for such optimization begins with the finding of gradients and derivatives. In this manuscript I present a very basic introduction to the computation of matrix derivatives of simple functions. I have adopted two approaches. First, I proceed with elementwise differentiation, computing partial derivatives at each step and obtaining the matrix form derivative from them. Second, I introduce (some well known) methods that give an organized set of tools to allow one to compute matrix derivatives without having to labor through the elementwise (potentially error-prone because of the jungle of indices) process. The second method is more elegant, but sometimes can prove to be more confusing for beginners than the first method. By first appreciating the concrete examples and then proceeding to the elegant theory seems to me to be a good course to take, and that is the course taken within this document.

[1]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[2]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[3]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .