A theoretical argument for complex-valued convolutional networks

A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with several complex-valued vectors followed by (2) taking the absolute value of every entry of the resulting vectors followed by (3) local averaging. For processing real-valued random vectors, complex-valued convnets can be viewed as “data-driven multiscale windowed power spectra,” “data-driven multiscale windowed absolute spectra,” “data-driven multiwavelet absolute values,” or (in their most general configuration) “data-driven nonlinear multiwavelet packets.” Indeed, complex-valued convnets can calculate multiscale windowed spectra when the convnet filters are windowed complex-valued exponentials. Standard real-valued convnets, using rectified linear units (ReLUs), sigmoidal (for example, logistic or tanh) nonlinearities, max. pooling, etc., do not obviously exhibit the same exact correspondence with data-driven wavelets (whereas for complex-valued convnets, the correspondence is much more than just a vague analogy). This note develops “data-driven multiscale windowed spectra” for certain stochastic processes that are common in the modeling of time series (such as audio) and natural images (including patterns and textures). We motivate the construction of such multiscale spectra in the form of “local averages of multiwavelet absolute values” or (in the most general configuration) “nonlinear multiwavelet packets” and connect these to certain “complex-valued convolutional networks.” A textbook treatment of all concepts and terms used above and below is given by [12]. Further information is available in the original work of [7], [15], [5], [4], [19], [16], [9], [20], and [18], for example. The work of [8], [13], [17], [2], and [3] also develops complex-valued convolutional networks (convnets). Renormalization group theory and its connection to convnets is discussed by [14]; this connection is incredibly insightful, though we leave further discussion to the cited work. Our exposition relies on nothing but the basic signal processing treated by [12]. For simplicity, we first limit consideration to the special case of a doubly infinite sequence of nonnegative random variables Xk, where k ranges over the integers. This input data will be the result of convolving an unmeasured independent and identically distributed (i.i.d.) sequence Zk, where k ranges over the integers, with an unknown sequence of real numbers fk, where k ranges over the integers (this latter sequence is known as a “filter,” whereas the i.i.d. sequence is known as “white noise”):

[1]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[2]  Y. Meyer Wavelets and Operators , 1993 .

[3]  Ronald R. Coifman,et al.  Signal processing and compression with wavelet packets , 1994 .

[4]  D. Donoho,et al.  Translation-Invariant DeNoising , 1995 .

[5]  William T. Freeman,et al.  Presented at: 2nd Annual IEEE International Conference on Image , 1995 .

[6]  Y. Meyer,et al.  Wavelets: Calderón-Zygmund and Multilinear Operators , 1997 .

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[9]  Stéphane Mallat,et al.  Locally stationary covariance and signal estimation with macrotiles , 2003, IEEE Trans. Signal Process..

[10]  Eero P. Simoncelli,et al.  On Advances in Statistical Modeling of Natural Images , 2004, Journal of Mathematical Imaging and Vision.

[11]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[12]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[13]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Ronald R. Coifman,et al.  Local discriminant bases and their applications , 1995, Journal of Mathematical Imaging and Vision.

[15]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[16]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[17]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[18]  Olaf Hellwich,et al.  Complex-Valued Convolutional Neural Networks for Object Detection in PolSAR data , 2010 .

[19]  Stephane Mollai Recursive interferometric representations , 2010, EUSIPCO.

[20]  S. Mallat Recursive interferometric representations , 2010, European Signal Processing Conference.

[21]  Binoy Pinto,et al.  Speeded Up Robust Features , 2011 .

[22]  Lorenzo Rosasco,et al.  The computational magic of the ventral stream: sketch of a theory (and why some deep architectures work). , 2012 .

[23]  S. Mallat,et al.  Intermittent process analysis with scattering moments , 2013, 1311.4104.

[24]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[25]  David J. Schwab,et al.  An exact mapping between the Variational Renormalization Group and Deep Learning , 2014, ArXiv.

[26]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.