ENHANCING THE MATRIX TRANSPOSE OPERATION USING INTEL AVX INSTRUCTION SET EXTENSION
暂无分享,去创建一个
[1] Jaeyoung Choi,et al. Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers , 1995, Parallel Comput..
[2] J. O. Eklundh,et al. A Fast Computer Method for Matrix Transposing , 1972, IEEE Transactions on Computers.
[3] Jyh-Jong Tsay,et al. Optimal Algorithm for Matrix Transpose on Wormhole-Switched Meshes , 2003, J. Inf. Sci. Eng..
[4] Alan Jay Smith,et al. Multimedia extensions for general purpose microprocessors: a survey , 2005, Microprocess. Microsystems.
[5] Viktor K. Prasanna,et al. An Efficient Algorithm for Out-of-Core Matrix Transposition , 2002, IEEE Trans. Computers.
[6] Stanislav G. Sedukhin,et al. Matrix Transpose on 2D Torus Array Processor , 2006, The Sixth IEEE International Conference on Computer and Information Technology (CIT'06).
[7] Nicolai Petkov,et al. Systolic Parallel Processing , 1992 .
[8] Sriram Krishnamoorthy,et al. Efficient parallel out-of-core matrix transposition , 2004, 2003 Proceedings IEEE International Conference on Cluster Computing.
[9] P. Sadayappan,et al. Efficient transposition algorithms for large matrices , 1993, Supercomputing '93.
[10] Ulrich Meyer,et al. Matrix transpose on meshes: theory and practice , 1997, Proceedings 11th International Parallel Processing Symposium.