A Review of SIMD Multimedia Extensions and their Usage in Scientific and Engineering Applications
暂无分享,去创建一个
[1] Ruby B. Lee. Accelerating multimedia with enhanced microprocessors , 1995, IEEE Micro.
[2] Alejandro Ramírez Bellido,et al. On the scalability of 1- and 2-dimensional SIMD extensions for multimedia applications , 2005 .
[3] Ariel Ortiz. Teaching the SIMD execution model:: assembling a few parallel programming skills , 2003, SIGCSE.
[4] Stamatis Vassiliadis,et al. Performance Impact of Misaligned Accesses in SIMD Extensions , 2006 .
[5] Richard H. Stern. Net access-divvying up the pie [Copyright and the Internet] , 1996, IEEE Micro.
[6] Chia-Lin Yang,et al. Using Intel Streaming SIMD Extensions for 3D Geometry Processing , 2002, IEEE Pacific Rim Conference on Multimedia.
[7] Ville Lappalainen. Performance of an advanced video codec on a general-purpose processor with media ISA extensions , 2000, 2000 Digest of Technical Papers. International Conference on Consumer Electronics. Nineteenth in the Series (Cat. No.00CH37102).
[8] Yen-Kuang Chen,et al. Implementation of H.264 encoder on general-purpose processors with hyper-threading technology , 2004, IS&T/SPIE Electronic Imaging.
[9] Yu-Fai Fung,et al. A parallel solution to linear systems , 2002, Microprocess. Microsystems.
[10] Lizy Kurian John,et al. Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements , 2003, IEEE Trans. Computers.
[11] Uri C. Weiser,et al. MMX technology extension to the Intel architecture , 1996, IEEE Micro.
[12] Jose Fridman. Data alignment for sub-word parallelism in DSP , 1999, 1999 IEEE Workshop on Signal Processing Systems. SiPS 99. Design and Implementation (Cat. No.99TH8461).
[13] Fred Weber,et al. AMD 3DNow! technology: architecture and implementations , 1999, IEEE Micro.
[14] Antonio Carlos,et al. Improving processing time of large images by instruction level parallelism , 2001 .
[15] Faouzi Kossentini,et al. Efficient coding and mapping algorithms for software-only real-time video coding at low bit rates , 2000, IEEE Trans. Circuits Syst. Video Technol..
[16] Gerhard Fettweis,et al. Compiler based exploration of DSP energy savings by SIMD operations , 2004, ASP-DAC.
[17] R. Nigel Horspool,et al. Compiler optimizations for processors with SIMD instructions , 2007, Softw. Pract. Exp..
[18] Gonzalo Travieso,et al. Matrix calculations with SIMD floating point instructions on x 86 processors , 2001 .
[19] José González,et al. Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions , 2005, J. VLSI Signal Process..
[20] Isom L. Crawford,et al. Software Optimization for High Performance Computers , 2000 .
[21] Ja-Ling Wu,et al. MMX-based DCT and MC algorithms for real-time pure software MPEG decoding , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.
[22] Y. Fisher. Fractal image compression: theory and application , 1995 .
[23] Kazumaro Aoki,et al. Elliptic Curve Arithmetic Using SIMD , 2001, ISC.
[24] Moinul H. Khan,et al. Accelerating mobile multimedia using Intel Wireless MMX™ technology. , 2004 .
[25] Peter Pirsch,et al. Instruction Set Extensions for MPEG-4 Video , 1999, J. VLSI Signal Process..
[26] Torbjørn Rognes,et al. Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors , 2000, Bioinform..
[27] Vladimir M. Pentkovski,et al. Implementing Streaming SIMD Extensions on the Pentium III Processor , 2000, IEEE Micro.
[28] Gang Ren,et al. A Preliminary Study on the Vectorization of Multimedia Applications for Multimedia Extensions , 2003, LCPC.
[29] Norman P. Jouppi,et al. Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.
[30] Peng Wu,et al. Efficient SIMD code generation for runtime alignment and length conversion , 2005, International Symposium on Code Generation and Optimization.
[31] Yen-Kuang Chen,et al. Video applications on hyper-threading technology , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.
[32] Alan Jay Smith,et al. Multimedia extensions for general purpose microprocessors: a survey , 2005, Microprocess. Microsystems.
[33] Kenneth A. Ross,et al. Implementing database operations using SIMD instructions , 2002, SIGMOD '02.
[34] Hunter Scales,et al. AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.
[35] F. Sanchez,et al. Parallel processing in biological sequence comparison using general purpose processors , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..
[36] Pankaj Godbole. Optimizing the advanced encryption standard on Intel's SIMD architecture , 2004 .
[37] Andreas Krall,et al. Compiler optimizations for processors with SIMD instructions , 2007, Softw. Pract. Exp..
[38] Stamatis Vassiliadis,et al. Performance comparison of SIMD implementations of the discrete wavelet transform , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).
[39] Henry G. Dietz,et al. The Scc Compiler: SWARing at MMX 3DNow! , 1999, LCPC.
[40] Emmett Witchel,et al. Increasing and detecting memory address congruence , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[41] Michael D. Smith,et al. Geust Editorial: Media processing: a new design target , 1996, IEEE Micro.
[42] B. Reese,et al. Real-time H.24-AVC codec on Intel architectures , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..
[43] Edward A. Lee,et al. DSP Processor Fundamentals: Architectures and Features , 1997 .
[44] Francisco Tirado Fernández,et al. 2-D wavelet transform enhancement on general-purpose microprocessors: memory hierarchy and SIMD parallelism exploitation , 2002 .
[45] Yen-Kuang Chen,et al. Implementation of H.264 encoder and decoder on personal computers , 2006, J. Vis. Commun. Image Represent..
[46] Francesco Zanichelli,et al. The long and winding road to high-performance image processing with MMX/SSE , 2000, Proceedings Fifth IEEE International Workshop on Computer Architectures for Machine Perception.
[47] Masao Ikekawa,et al. Fast 2D IDCT implementation with multimedia instructions for a software MPEG2 decoder , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[48] Mateo Valero,et al. Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.
[49] Shorin Kyo,et al. AN EXTENDED C LANGUAGE AND A SIMD COMPILER FOR EFFICIENT IMPLEMENTATION OF IMAGE FILTERS ON MEDIA EXTENDED MICRO-PROCESSORS , 2003 .
[50] Douglas Aberdeen,et al. General Matrix-Matrix Multiplication Using SIMD Features of the PIII (Research Note) , 2000, Euro-Par.
[51] Takashi Miyazaki,et al. Radix-4 FFT implementation using SIMD multimedia instructions , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[52] A. Uhl,et al. SIMD Parallelization of Common Wavelet Filters , 2005 .
[53] Peter Pirsch,et al. VLSI architectures for video compression-a survey , 1995, Proc. IEEE.
[54] Gregory W. Heckler,et al. SIMD correlator library for GNSS software receivers , 2006 .
[55] Dinesh Manocha,et al. Fast computation of database operations using graphics processors , 2004, SIGMOD '04.
[56] Stamatis Vassiliadis,et al. Limitations of special-purpose instructions for similarity measurements in media SIMD extensions , 2006, CASES '06.
[57] Ayal Zaks,et al. Compiler Vectorization Techniques for a Disjoint SIMD Architecture , 2002 .
[58] Stamatis Vassiliadis,et al. Instruction set architecture enhancements for video processing , 2005, 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05).
[59] Ramesh Radhakrishnan,et al. Evaluating MMX technology using DSP and multimedia applications , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.
[60] Ville Lappalainen,et al. Overview of research efforts on media ISA extensions and their usage in video coding , 2002, IEEE Trans. Circuits Syst. Video Technol..
[61] Franz Franchetti,et al. Efficient Utilization of SIMD Extensions , 2005, Proceedings of the IEEE.
[62] Douglas Aberdeen,et al. Emmerald: a fast matrix–matrix multiply using Intel's SSE instructions , 2001, Concurr. Comput. Pract. Exp..
[63] Uri C. Weiser,et al. Intel's MMX/sup TM/ technology-a new instruction set extension , 1997, Proceedings IEEE COMPCON 97. Digest of Papers.
[64] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.
[65] Yen-Kuang Chen,et al. Implementation of H.264 decoder on general-purpose processors with media instructions , 2003, IS&T/SPIE Electronic Imaging.
[66] Aart J. C. Bik,et al. Automatic Intra-Register Vectorization for the Intel® Architecture , 2002, International Journal of Parallel Programming.
[67] Franz Franchetti,et al. SIMD Vectorization of Straight Line FFT Code , 2003, Euro-Par.
[68] Joos Vandewalle,et al. Fast Hashing on the Pentium , 1996, CRYPTO.
[69] Chia-Lin Yang,et al. Exploiting Parallelism in Geometry Processing with General Purpose Processors and Floating-Point SIMD Instructions , 2000, IEEE Trans. Computers.
[70] Omar Hammami,et al. Application-specific SIMD synthesis for reconfigurable architectures , 2006, Microprocess. Microsystems.
[71] Ville Lappalainen,et al. Performance analysis of Intel MMX technology for an H.263 video H.263 video encoder , 1998, MULTIMEDIA '98.
[72] Francisco Tirado,et al. Vectorization of the 2D wavelet lifting transform using SIMD extensions , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[73] D. Naishlos,et al. Autovectorization in GCC , 2004 .
[74] Aart J. C. Bik,et al. Multimedia vectorization of floating‐point MIN/MAX reductions , 2006, Concurr. Comput. Pract. Exp..
[75] Georges-André Silber,et al. An Empirical Study of Some x 86 SIMD Integer Extensions , 2005 .
[76] Tsuyoshi Takagi,et al. Fast Elliptic Curve Multiplications with SIMD Operations , 2004, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..
[77] Patricio Bulic,et al. An Extended ANSI C for Processors with a Multimedia Extension , 2004, International Journal of Parallel Programming.
[78] T. Rognes,et al. ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. , 2001, Nucleic acids research.
[79] Andreas Krall,et al. Compilation Techniques for Multimedia Processors , 2004, International Journal of Parallel Programming.
[80] Ruby B. Lee. Multimedia extensions for general-purpose processors , 1997, 1997 IEEE Workshop on Signal Processing Systems. SiPS 97 Design and Implementation formerly VLSI Signal Processing.
[81] I. Kuroda,et al. Multimedia processors , 1998, Proc. IEEE.
[82] Shreekant S. Thakkar,et al. Internet Streaming SIMD Extensions , 1999, Computer.
[83] Ruby B. Lee,et al. Refining instruction set architecture for high-performance multimedia processing in constrained environments , 2002, Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors.
[84] Saman P. Amarasinghe,et al. Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.
[85] Peng Wu,et al. Vectorization for SIMD architectures with alignment constraints , 2004, PLDI '04.
[86] Ruby B. Lee,et al. Algorithmic and architectural enhancements for real-time MPEG-1 decoding on a general purpose RISC workstation , 1995, IEEE Trans. Circuits Syst. Video Technol..
[87] V. Paul Rodriguez. A radix-2 FFT algorithm for Modern Single Instruction Multiple Data (SIMD) architectures , 2002 .
[88] Federico Tombari,et al. Speeding-up NCC-based template matching using parallel multimedia instructions , 2005, Seventh International Workshop on Computer Architecture for Machine Perception (CAMP'05).
[89] Nathan T. Slingerland. 1 Performance Analysis of Instruction Set Architecture Extensions for Multimedia § , 2001 .
[90] R. Govindarajan,et al. A Vectorizing Compiler for Multimedia Extensions , 2000, International Journal of Parallel Programming.
[91] J. Stoer,et al. Introduction to Numerical Analysis , 2002 .
[92] S. Krishnaprasad. SIMD programming illustrated using Intel's MMX instruction set , 2004 .
[93] Sameh W. Asaad,et al. An innovative low-power high-performance programmable signal processor for digital communications , 2003, IBM J. Res. Dev..
[94] W. Paul Cockshott,et al. Orthogonal parallel processing in vector Pascal , 2006, Comput. Lang. Syst. Struct..
[95] Alan Jay Smith,et al. Measuring the Performance of Multimedia Instruction Sets , 2002, IEEE Trans. Computers.
[96] Alessandro Lonardo,et al. C++ programming language for an abstract massively parallel SIMD architecture , 2000, ArXiv.
[97] Peter Kogge,et al. Generation of permutations for SIMD processors , 2005, LCTES '05.
[98] Ariel Ortiz Ramirez. An Overview of Intel's MMX Technology , 1999 .
[99] R. Leupers. Code selection for media processors with SIMD instructions , 2000, Proceedings Design, Automation and Test in Europe Conference and Exhibition 2000 (Cat. No. PR00537).
[100] Charles Roth,et al. A low-power, high-speed implementation of a PowerPC/sup TM/ microprocessor vector extension , 1999, Proceedings 14th IEEE Symposium on Computer Arithmetic (Cat. No.99CB36336).
[101] Xiandong Meng,et al. Optimised fine and coarse parallelism for sequence homology search , 2006, Int. J. Bioinform. Res. Appl..
[102] Hye-Jeong Cho,et al. An Efficient SIMD-based Quarter-Pixel Interpolation Method for H.264/AVC , 2006 .
[103] Insung Ihm,et al. SIMD Optimization of Linear Expressions for Programmable Graphics Hardware , 2004, Comput. Graph. Forum.
[104] Hamid Sarbazi-Azad,et al. Efficient polynomial root finding using SIMD extensions , 2005, 11th International Conference on Parallel and Distributed Systems (ICPADS'05).
[105] Chew Yean Yam. Optimizing Video Compression for Intel ® Digital Security Surveillance applications with SIMD and Hyper-Threading Technology by Chew Yean Yam Intel Corporation , 2005 .
[106] Gang Ren,et al. An empirical study on the vectorization of multimedia applications for multimedia extensions , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[107] Xinmin Tian,et al. Efficient multithreading implementation of H.264 encoder on Intel hyper-threading architectures , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.
[108] Henry G. Dietz,et al. Compiling for SIMD Within a Register , 1998, LCPC.