VALib and SimpleVector: tools for rapid initial research on vector architectures

Vector architectures have been traditionally applied to the supercomputing domain with many successful incarnations. The energy efficiency and high performance of vector processors, as well as their applicability in other emerging domains, encourage pursuing further research on vector architectures. However, there is a lack of appropriate tools to perform this research. This paper presents two tools for measuring and analyzing an application's suitability for vector microarchitectures. The first tool is VALib, a library that enables hand-crafted vectorization of applications and its main purpose is to collect data for detailed instruction level characterization and to generate input traces for the second tool. The second tool is SimpleVector, a fast trace-driven simulator that is used to estimate the execution time of a vectorized application on a candidate vector microarchitecture. The potential of the tools is demonstrated using six applications from emerging application domains such as speech and face recognition, video encoding, bioinformatics, machine learning and graph search. The results indicate that 63.2% to 91.1% of these contemporary applications are vectorizable. Then, over multiple use cases, we demonstrate that the tools can facilitate rapid evaluation of various vector architecture designs.

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  Richard M. Russell,et al.  The CRAY-1 computer system , 1978, CACM.

[3]  Willi Schönauer,et al.  Scientific computing on vector computers , 1987, Special topics in supercomputing.

[4]  Mateo Valero,et al.  Instruction level characterization of the Perfect Club programs on a vector computer , 1995 .

[5]  Mateo Valero,et al.  Decoupled vector architectures , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[6]  VNobuo Uchida Hardware of VX / VPP 300 / VPP 700 Series of Vector-Parallel Supercomputer Systems , 1997 .

[7]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[8]  Krste Asanovic,et al.  T0 Engineering Data , 1997 .

[9]  John Wawrzynek,et al.  Vector microprocessors , 1998 .

[10]  Mateo Valero,et al.  Adding a vector unit to a superscalar processor , 1999, ICS '99.

[11]  James E. Smith,et al.  Vector instruction set support for conditional operations , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[12]  David A. Patterson,et al.  Scalable Vector Media-processors for Embedded Systems , 2002 .

[13]  Mateo Valero,et al.  Three-dimensional memory vectorization for high bandwidth media memory systems , 2002, MICRO.

[14]  Matthew Mattina,et al.  Tarantula: a vector extension to the alpha architecture , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[15]  Ali Saidi,et al.  The Reconfigurable Streaming Vector Processor (RSVP , 2003 .

[16]  Michael A. Schuette,et al.  The Reconfigurable Streaming Vector Processor (RSVPTM) , 2003, MICRO.

[17]  Christoforos E. Kozyrakis,et al.  Overcoming the limitations of conventional vector processors , 2003, ISCA '03.

[18]  Aamer Jaleel,et al.  DRAMsim: a memory system simulator , 2005, CARN.

[19]  Berkin Özisikyilmaz,et al.  MineBench: A Benchmark Suite for Data Mining Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.

[20]  Jonathan W. Berry,et al.  DFS: A Simple to Write Yet Difficult to Execute Benchmark , 2006, 2006 IEEE International Symposium on Workload Characterization.

[21]  Martin Hopkins,et al.  Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.

[22]  Ronny Krashinsky Vector-thread architecture and implementation , 2007 .

[23]  Hiroaki Kobayashi,et al.  First Experiences with NEC SX-9 , 2008, High Performance Computing on Vector Systems.

[24]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[25]  Christopher Batten,et al.  Simplified vector-thread architectures for flexible and efficient data-parallel accelerators , 2010 .

[26]  Christopher Batten,et al.  Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators , 2013, ACM Trans. Comput. Syst..

[27]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .

[28]  Mateo Valero,et al.  Vector Extensions for Decision Support DBMS Acceleration , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[29]  Jason Cong,et al.  Compilation and architecture support for customized vector instruction extension , 2012, 17th Asia and South Pacific Design Automation Conference.

[30]  M. Pharr,et al.  ispc: A SPMD compiler for high-performance CPU programming , 2012, 2012 Innovative Parallel Computing (InPar).