A Survey to Predict the Trend of AI-able Server Evolution in the Cloud

About a decade ago, people concerned about the risks of adopting cloud computing. It was an unproven new thing that raised more questions than it answered. Nowadays, we hear more about the risks of not adopting the cloud. Three of the leading cloud players, Amazon Web Services, Microsoft Azure, and Google Cloud Platform, and other participants have developed complex cloud platforms that are driving the cloud agenda and launching innovative new products to meet the needs of modern businesses. When looking at processors, core components of the cloud, there is a trend for hyperscale data centers is to move beyond the CPUs and turn to dedicated chips, such as graphics processing units, field programmable gating arrays, and application specific integrated circuits. We think it is an artificial intelligence (AI) realization process and provide a detailed survey about hardware server design in this process. After discussing and summarizing various disclosed techniques and platforms, we conceived a hybrid hardware structure for efficient AI applications.

[1]  Philip Heng Wai Leong,et al.  FINN: A Framework for Fast, Scalable Binarized Neural Network Inference , 2016, FPGA.

[2]  Bin Liu,et al.  Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[4]  Jeffrey Stuecheli,et al.  CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[5]  Eriko Nurvitadhi,et al.  Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? , 2017, FPGA.

[6]  Arto Ojala,et al.  Discovering and creating business opportunities for cloud services , 2016, J. Syst. Softw..

[7]  Mark Sandler,et al.  The Power of Sparsity in Convolutional Neural Networks , 2017, ArXiv.

[8]  Andrew Putnam Large-scale reconfigurable computing in a microsoft datacenter , 2014, 2014 IEEE Hot Chips 26 Symposium (HCS).

[9]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[10]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[11]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[12]  Keke Gai,et al.  Dynamic energy-aware cloudlet-based mobile cloud computing model for green computing , 2016, J. Netw. Comput. Appl..

[13]  Yao Zheng,et al.  DDoS attack protection in the era of cloud computing and Software-Defined Networking , 2015, Comput. Networks.

[14]  Wonyong Sung,et al.  FPGA based implementation of deep neural networks using on-chip memory only , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Pritish Narayanan,et al.  Deep Learning with Limited Numerical Precision , 2015, ICML.

[16]  Yangqing Jia,et al.  Learning Semantic Image Representations at a Large Scale , 2014 .

[17]  Keke Gai,et al.  Spoofing-Jamming Attack Strategy Using Optimal Power Distributions in Wireless Smart Grid Networks , 2017, IEEE Transactions on Smart Grid.

[18]  Frédéric Pétrot,et al.  Ternary neural networks for resource-efficient AI applications , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[19]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Philip Heng Wai Leong,et al.  Scaling Binarized Neural Networks on Reconfigurable Logic , 2017, PARMA-DITAM '17.

[21]  Samuel Greengard,et al.  Making chips smarter , 2017, Commun. ACM.

[22]  Paris Smaragdis,et al.  Bitwise Neural Networks , 2016, ArXiv.

[23]  Eric Bonfillou,et al.  Experience of public procurement of Open Compute servers , 2015 .

[24]  Yao Zheng,et al.  DDoS Attack Protection in the Era of Cloud Computing and Software-Defined Networking , 2014, 2014 IEEE 22nd International Conference on Network Protocols.

[25]  Luca Benini,et al.  YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[26]  Min Chen,et al.  SA-EAST , 2017, ACM Trans. Embed. Comput. Syst..

[27]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[28]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[29]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[30]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[31]  Yoshua Bengio,et al.  Neural Networks with Few Multiplications , 2015, ICLR.

[32]  Ran El-Yaniv,et al.  Binarized Neural Networks , 2016, NIPS.

[33]  Eriko Nurvitadhi,et al.  Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC , 2016, 2016 International Conference on Field-Programmable Technology (FPT).

[34]  Rodney S. Tucker,et al.  Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport , 2011, Proceedings of the IEEE.

[35]  Keke Gai,et al.  Energy-aware task assignment for mobile cyber-enabled applications in heterogeneous cloud computing , 2018, J. Parallel Distributed Comput..

[36]  Dimitrios Soudris,et al.  A survey on reconfigurable accelerators for cloud computing , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[37]  Kevin Skadron,et al.  Accelerating Compute-Intensive Applications with GPUs and FPGAs , 2008, 2008 Symposium on Application Specific Processors.

[38]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).