The need for energy-efficient high-end systems has led hardware vendors to
design new types of chips for general purpose computing. However, designing or
porting a code tailored for these new types of processing units is often
considered as a major hurdle for their broad adoption. In this paper, we
consider a modern Intel Xeon Phi processor, namely the Intel Knights Landing
(KNL) and a numerical code initially designed for a classical multi-core
system. More precisely, we consider the qr_mumps scientific library implementing a
sparse direct method on top of the StarPU runtime system. We show that with a
portable programming model (task-based programming), a good software support
(a robust runtime system coupled with an efficient scheduler) and some well
defined hardware and software settings, we are able to transparently run the
exact same numerical code. This code not only achieves very high performance
(up to 1 TFlop/s) on the KNL but also significantly outperforms a modern Intel
Xeon multi-core processor both in terms of time to solution and energy efficiency up to a factor of 2.0.