Synthetic Gradient Methods with Virtual Forward-Backward Networks

● We applied VFBN on decoupling ResNet-110 into 2 subnetworks, and used 4-layered (2-ResNet modules) CNN as VFBN. ○ The learning curve of the Jaderberg’s model fall significantly behind the BP, while our VFBN keeps its pace with the BP throughout. ○ The performance with VFBN is 5.51 % error rate, which is better than the baseline such as half-ResNet (5.76%) and subnetwork-wise supervised loss learning (5.71%), but worse than standard BackProp. ● VFBN improved the quality of synthetic gradients over the original model in terms of cosine distance.