论文信息 - ATA: Attentional Non-Linear Activation Function Approximation for VLSI-Based Neural Networks

ATA: Attentional Non-Linear Activation Function Approximation for VLSI-Based Neural Networks

In this letter, we present an attentional non-linear activation function approximation method called ATA for VLSI-based neural networks. Unlike other approximation methods that pursue the low hardware resources with a high recognition accuracy loss, the ATA utilizes the pixel attention to focus on the important features to keep the recognition accuracy and reduce resource cost. Specifically, attention applied in the activation function is realized by the approximated activation functions with different fitting errors for VLSI-based neural networks. The important features are highlighted by the piecewise linear function and improved look-up table with low fitting error, while the trivial features are ignored with the large fitting error. Experimental results demonstrate that the ATA outperforms other state-of-the-art approximation methods in recognition accuracy, power and area.

Jueping Cai | Linyu Wei | Wuzhuang Wang

[1] Yiqi Wang,et al. CANet: Concatenated Attention Neural Network for Image Restoration , 2020, IEEE Signal Processing Letters.

[2] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[4] Jun Fu,et al. Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Jueping Cai,et al. P-SFA: Probability based Sigmoid Function Approximation for Low-complexity Hardware Implementation , 2020, Microprocess. Microsystems.

[6] Javier Echanobe,et al. Controlled accuracy approximation of sigmoid function for efficient FPGA-based implementation of artificial neurons , 2013 .

[7] Youchang Kim,et al. A 4.9 mW neural network task scheduler for congestion-minimized network-on-chip in multi-core systems , 2014, 2014 IEEE Asian Solid-State Circuits Conference (A-SSCC).

[8] Pramod Kumar Meher. An optimized lookup-table for the evaluation of sigmoid function for artificial neural networks , 2010, 2010 18th IEEE/IFIP International Conference on VLSI and System-on-Chip.

[9] Mitra Mirhassani,et al. Efficient VLSI Implementation of Neural Networks With Hyperbolic Tangent Activation Function , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[10] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.

[11] Nanning Zheng,et al. Design Space Exploration of Neural Network Activation Function Circuits , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[12] Huapeng Wu,et al. High Speed VLSI Implementation of the Hyperbolic Tangent Sigmoid Function , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.

[13] L. Fanucci,et al. Low-error digital hardware implementation of artificial neuron activation functions and their derivative , 2011, Microprocess. Microsystems.

[14] Saibal Mukhopadhyay,et al. Attention-Based Activation Pruning to Reduce Data Movement in Real-Time AI: A Case-Study on Local Motion Planning in Autonomous Vehicles , 2020, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[15] Lawrence K. Saul,et al. Large-Margin Classification in Infinite Neural Networks , 2010, Neural Computation.

[16] Bidyut Baran Chaudhuri,et al. LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks , 2019, CVIP.

[17] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[19] Lei Zhang,et al. AsNet: Asymmetrical Network for Learning Rich Features in Person Re-Identification , 2020, IEEE Signal Processing Letters.

[20] Jaeyeon Lee,et al. Deep neural networks with a set of node-wise varying activation functions , 2020, Neural Networks.

[21] Tomer Michaeli,et al. xUnit: Learning a Spatial Activation Function for Efficient Image Restoration , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.