Evaluation of Neural and Genetic Algorithms for Synthesizing Parallel Storage Schemes

Exploiting compile time knowledge to improve memory bandwidth can produce noticeable improvements at runtime.(1, 2) Allocating the data structure(1) to separate memories whenever the data may be accessed in parallel allows improvements in memory access time of 13 to 40%. We are concerned with synthesizing compiler storage schemes for minimizing array access conflicts in parallel memories for a set of compiler predicted data access patterns. The access patterns can be easily found for many synchronous dataflow computations like multimedia compression/decompression algorithms, DSP, vision, robotics, etc. A storage scheme is a mapping from array addresses into storages. Finding a conflict-free storage scheme for a set of data patterns is NP-complete. This problem is reduceable to weighted graph coloring. Optimizing the storage scheme is investigated by using constructive heuristics, neural methods, and genetic algorithms. The details of implementation of these different approaches are presented. Using realistic data patterns, simulation shows that memory utilization of 80% or higher can be achieved in the case of 20 data patterns over up to 256 parallel memories, i.e., a scalable parallel memory. The neural approach was relatively very fast in producing reasonably good solutions even in the case of large problem sizes. Convergence of proposed neural algorithm seems to be only slightly dependent on problem size. Genetic algorithms are recommended for advanced compiler optimization especially for large problem sizes; and applications which are compiled once and run many times over different data sets. The solutions presented are also useful for other optimization problems.

[1]  Marshall C. Pease,et al.  The Indirect Binary n-Cube Microprocessor Array , 1977, IEEE Transactions on Computers.

[2]  Rajiv Gupta,et al.  Compile-Time Techniques for Improving Scalar Access Performance in Parallel Memories , 1991, IEEE Trans. Parallel Distributed Syst..

[3]  W. W. Hwu,et al.  Achieving high instruction cache performance with an optimizing compiler , 1989, ISCA '89.

[4]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[5]  Alan Norton,et al.  A Class of Boolean Linear Transformations for Conflict-Free Power-of-Two Stride Access , 1987, ICPP.

[6]  Cesare Alippi,et al.  Genetic-algorithm programming environments , 1994, Computer.

[7]  James Smith,et al.  A Simulation Study of the CRAY X-MP Memory System , 1986, IEEE Transactions on Computers.

[8]  André Seznec,et al.  Interleaved Parallel Schemes , 1994, IEEE Trans. Parallel Distributed Syst..

[9]  Michael J. Quinn,et al.  Designing Efficient Algorithms for Parallel Computers , 1987 .

[10]  Duncan H. Lawrie,et al.  Access and Alignment of Data in an Array Processor , 1975, IEEE Transactions on Computers.

[11]  Todd C. Mowry,et al.  Compiler-directed page coloring for multiprocessors , 1996, ASPLOS VII.

[12]  David T. Harper,et al.  Increased Memory Performance During Vector Accesses Through the use of Linear Address Transformations , 1992, IEEE Trans. Computers.

[13]  Gurindar S. Sohi High-Bandwidth Interleaved Memories for Vector Processors-A Simulation Study , 1993, IEEE Trans. Computers.

[14]  Mayez A. Al-Mouhamed,et al.  A Heuristic Storage for Minimizing Access Time of Arbitrary Data Patterns , 1997, IEEE Trans. Parallel Distributed Syst..

[15]  Christoforos E. Kozyrakis,et al.  A New Direction for Computer Architecture Research , 1998, Computer.

[16]  Avrim Blum,et al.  New approximation algorithms for graph coloring , 1994, JACM.

[17]  William Jalby,et al.  XOR-Schemes: A Flexible Data Organization in Parallel Memories , 1985, ICPP.

[18]  Scott McFarling,et al.  Program optimization for instruction caches , 1989, ASPLOS III.

[19]  Linda G. Shapiro,et al.  Computer and Robot Vision , 1991 .

[20]  Shin-Ichi Nakano,et al.  Edge-Coloring Partial k-Trees , 1996, J. Algorithms.

[21]  Mayez A. Al-Mouhamed,et al.  Minimization of Memory and Network Contention for Accessing Arbitrary Data Patterns in SIMD Systems , 1996, IEEE Trans. Computers.

[22]  Emanuele Trucco,et al.  Computer and Robot Vision , 1995 .

[23]  Kai Hwang,et al.  Computer architecture and parallel processing , 1984, McGraw-Hill Series in computer organization and architecture.

[24]  Tony R. Martinez,et al.  Digital Neural Networks , 1988, Proceedings of the 1988 IEEE International Conference on Systems, Man, and Cybernetics.

[25]  Harvey G. Cragon,et al.  Memory systems and pipelined processors , 1996 .

[26]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[27]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[28]  Kyungsook Y. Lee On the Rearrangeability of 2(log2N) - 1 Stage Permutation Networks , 1985, IEEE Trans. Computers.

[29]  Paul Budnik,et al.  The Organization and Use of Parallel Memories , 1971, IEEE Transactions on Computers.

[30]  Ashoke Deb Multiskewing-A Novel Technique for Optimal Parallel Memory Access , 1996, IEEE Trans. Parallel Distributed Syst..

[31]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[32]  David T. Harper,et al.  Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems , 1991, IEEE Trans. Parallel Distributed Syst..

[33]  Paul Chow,et al.  Exploiting dual data-memory banks in digital signal processors , 1996, ASPLOS VII.