With the growing numbers of both parallel architectures and related programming models, the benchmarking tasks become very tricky since parallel programming requires architecture-dependent compilers and languages as well as high programming expertise. More than just comparing architectures with synthetic benchmarks, benchmarking is also more and more used to design specialized systems composed of heterogeneous computing resources to optimize the performance or performance/watt ratio (e.g. embedded systems designers build System-on-Chip (SoC) out of dedicated and well-chosen components). In the High-Performance-Computing (HPC) domain, systems are designed with symmetric and scalable computing nodes built to deliver the highest performance on a wide variety of applications. However, HPC is now facing cost and power consumption issues which motivate the design of heterogeneous systems. This is one of the rationales of the European FiPS project, which proposes to develop hardware architecture and software methodology easing the design of such systems. Thus, having a fair comparison between architectures while considering an application is of growing importance. Unfortunately, porting it on all available architectures using the related programming models is impossible. To tackle this challenge, we introduced a novel methodology to evaluate and to compare parallel architectures in order to ease the work of the programmer. Based on the usage of micro benchmarks, code profiling and characterization tools, this methodology introduces a semi-automatic prediction of sequential applications performances on a set of parallel architectures. In addition, performance estimation is correlated with the cost of other criteria such as power or portability effort. Introduced for targeting vision-based embedded applications, our methodology is currently being extended to target more complex applications from HPC world. This paper extends our work with new experiments and early results on a real HPC application of DNA sequencing.
[1]
Paul A. Viola,et al.
Rapid object detection using a boosted cascade of simple features
,
2001,
Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.
[2]
Piotr Gawron,et al.
G-DNA - a highly efficient multi-GPU/MPI tool for aligning nucleotide reads
,
2013
.
[3]
Olivier Temam,et al.
UNISIM: An Open Simulation Environment and Library for Complex Architecture Design and Collaborative Development
,
2007,
IEEE Computer Architecture Letters.
[4]
Denis Barthou,et al.
Performance evaluation and analysis of thread pinning strategies on multi-core platforms: Case study of SPEC OMP applications on intel architectures
,
2011,
2011 International Conference on High Performance Computing & Simulation.
[5]
Jacek Blazewicz,et al.
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs
,
2011,
BMC Bioinformatics.
[6]
Toshio Nakatani,et al.
Improving the performance of trace-based systems by false loop filtering
,
2011,
ASPLOS XVI.
[7]
Didier Demigny,et al.
Efficient ASIC and FPGA implementations of IIR filters for real time edge detection
,
1997,
Proceedings of International Conference on Image Processing.
[8]
Ian H. Witten,et al.
Identifying Hierarchical Structure in Sequences: A linear-time algorithm
,
1997,
J. Artif. Intell. Res..
[9]
Yves Lhuillier,et al.
A unified methodology for a fast benchmarking of parallel architecture
,
2014,
2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[10]
Lieven Eeckhout,et al.
Microarchitecture-Independent Workload Characterization
,
2007,
IEEE Micro.