Large Batch Optimization for Deep Learning Using New Complete Layer-Wise Adaptive Rate Scaling