Neural Networks, Hypersurfaces, and the Generalized Radon Transform [Lecture Notes]

Artificial neural networks (ANNs) have long been used as a mathematical modeling method and have recently found numerous applications in science and technology, including computer vision, signal processing, and machine learning [1], to name a few. Although notable function approximation results exist [2], theoretical explanations have yet to catch up with newer developments, particularly with regard to (deep) hierarchical learning. As a consequence, numerous doubts often accompany NN practitioners, such as How many layers should one use? What is the effect of different activation functions? What are the effects of pooling? and many others.