Benchmarking data analysis and machine learning applications on the Intel KNL many-core processor

Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing Center (LLSC), the majority of users are running data analysis applications such as MATLAB and Octave. More recently, machine learning applications, such as the UC Berkeley Caffe deep learning framework, have become increasingly important to LLSC users. Thus, the performance of these applications on KNL systems is of high interest to LLSC users and the broader data analysis and machine learning communities. Our data analysis benchmarks of these application on the Intel KNL processor indicate that single-core double-precision generalized matrix multiply (DGEMM) performance on KNL systems has improved by ∼3.5× compared to prior Intel Xeon technologies. Our data analysis applications also achieved ∼60% of the theoretical peak performance. Also a performance comparison of a machine learning application, Caffe, between the two different Intel CPUs, Xeon E5 v3 and Xeon Phi 7210, demonstrated a 2.7× improvement on a KNL node.

[1]  Jeremy Kepner,et al.  Enabling on-demand database computing with MIT SuperCloud database management system , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[2]  Jeremy Kepner Parallel MATLAB - for Multicore and Multinode Computers , 2009, Software, environments, tools.

[3]  Sergey Bastrakov,et al.  Co-design of a Particle-in-Cell Plasma Simulation Code for Intel Xeon Phi: A First Look at Knights Landing , 2016, ICA3PP Workshops.

[4]  Robert K. Cunningham,et al.  Computing on masked data: a high performance method for improving big data veracity , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[5]  Jeremy Kepner,et al.  High-Productivity Software Development with pMatlab , 2009, Computing in Science & Engineering.

[6]  Jeremy Kepner,et al.  LLMapReduce: Multi-level map-reduce for high performance data analysis , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[7]  Jeremy Kepner,et al.  Big Data strategies for Data Center Infrastructure management using a 3D gaming platform , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[8]  Jeremy Kepner,et al.  Driving big data with big compute , 2012, 2012 IEEE Conference on High Performance Extreme Computing.

[9]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Sergey Bastrakov,et al.  Particle-in-cell plasma simulation on heterogeneous cluster systems , 2012, J. Comput. Sci..

[11]  Hyung Seok Kim,et al.  Interactive Grid Computing at Lincoln Laboratory , 2006 .

[12]  Jeremy Kepner,et al.  LLSuperCloud: Sharing HPC systems for diverse rapid prototyping , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[13]  Jeremy Kepner,et al.  LLgrid: Enabling On-Demand Grid Computing with gridMatlab and pMatlab , 2004 .

[14]  Avinash Sodani,et al.  Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .

[15]  Jeremy Kepner,et al.  Dynamic distributed dimensional data model (D4M) database and computation system , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Jeremy Kepner,et al.  D4M 2.0 schema: A general purpose high performance schema for the Accumulo database , 2013, 2013 IEEE High Performance Extreme Computing Conference (HPEC).

[17]  Jeremy Kepner,et al.  Achieving 100,000,000 database inserts per second using Accumulo and D4M , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[18]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[19]  Jeremy Kepner,et al.  D4M: Bringing associative arrays to database engines , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[22]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.