Improving the scalability of multicore systems with a focus on H.264 video decoding
暂无分享,去创建一个
[1] Milind Girkar,et al. Towards efficient multi-level threading of H.264 encoder on Intel hyper-threading architectures , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[2] Nur Engin,et al. CVP : a programmable Co Vector Processor for 3G mobile baseband processing , 2003 .
[3] T. Fujiyoshi,et al. A 63-mW H.264/MPEG-4 audio/visual codec LSI with module-wise dynamic Voltage/frequency scaling , 2006, IEEE Journal of Solid-State Circuits.
[4] Xiaobo Sharon Hu,et al. Linear-time matrix transpose algorithms using vector register file with diagonal registers , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[5] Stamatis Vassiliadis,et al. The TM3270 media-processor , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[6] Jani Lainema,et al. Adaptive deblocking filter , 2003, IEEE Trans. Circuits Syst. Video Technol..
[7] Paraskevas Evripidou,et al. Chip multiprocessor based on data-driven multithreading model , 2007, Int. J. High Perform. Syst. Archit..
[8] Ajay Luthra,et al. Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..
[9] Wonyong Sung,et al. Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware , 2008, CASES '08.
[10] Yen-Kuang Chen,et al. Implementation of H.264 encoder and decoder on personal computers , 2006, J. Vis. Commun. Image Represent..
[11] Eduard Ayguadé,et al. Hierarchical Task-Based Programming With StarSs , 2009, Int. J. High Perform. Comput. Appl..
[12] S. Asano,et al. The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..
[13] Erik B. van der Tol,et al. Mapping of H.264 decoding on a multiprocessor architecture , 2003, IS&T/SPIE Electronic Imaging.
[14] David A. Bader,et al. Optimizing JPEG2000 Still Image Encoding on the Cell Broadband Engine , 2008, 2008 37th International Conference on Parallel Processing.
[15] Rosa M. Badia,et al. CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[16] Heiko Schwarz,et al. Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard , 2003, IEEE Trans. Circuits Syst. Video Technol..
[17] Gerard de Haan,et al. Application specific instruction-set processor template for motion estimation in video applications , 2005, IEEE Transactions on Circuits and Systems for Video Technology.
[18] C. C. Chi. Parallel H.264 Decoding Strategies for Cell Broadband Engine , 2010 .
[19] Markus Flierl,et al. Generalized B pictures and the draft H.264/AVC video-compression standard , 2003, IEEE Trans. Circuits Syst. Video Technol..
[20] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[21] Kevin D. Kissell,et al. MIPS MT: A Multithreaded RISC Architecture for Embedded Real-Time Processing , 2008, HiPEAC.
[22] Yen-Kuang Chen,et al. Implementation of H.264 decoder on general-purpose processors with media instructions , 2003, IS&T/SPIE Electronic Imaging.
[23] Yanjun Zhang,et al. VS-ISA: A Video Specific Instruction Set Architecture for ASIP Design , 2006, IIH-MSP.
[24] Uri C. Weiser,et al. Intel MMX for multimedia PCs , 1997, Commun. ACM.
[25] H. Peter Hofstee. Power-constrained microprocessor design , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.
[26] Roberto Giorgi,et al. DTA-C: A Decoupled multi-Threaded Architecture for CMP Systems , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).
[27] Ben H. H. Juurlink,et al. Parallel Scalability of Video Decoders , 2009, J. Signal Process. Syst..
[28] Dean M. Tullsen,et al. Proximity-aware directory-based coherence for multi-core processor architectures , 2007, SPAA '07.
[29] Lurng-Kuo Liu,et al. Video Analysis and Compression on the STI Cell Broadband Engine Processor , 2006, 2006 IEEE International Conference on Multimedia and Expo.
[30] Magnus Själander,et al. A Look-Ahead Task Management Unit for Embedded Multi-Core Architectures , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.
[31] M. Moudgill,et al. THE SANDBLASTER 2 . 0 ARCHITECTURE AND SB 3500 IMPLEMENTATION , 2008 .
[32] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[33] G.S. Moschytz,et al. Practical fast 1-D DCT algorithms with 11 multiplications , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[34] Yan Solihin,et al. Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.
[35] Hanjin Cho,et al. An area efficient video/audio codec for portable multimedia application , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).
[36] Coniferous softwood. GENERAL TERMS , 2003 .
[37] J. M. Pierre Langlois,et al. Application Specific Instruction set processor specialized for block motion estimation , 2008, 2008 IEEE International Conference on Computer Design.
[38] Mateo Valero,et al. HD-VideoBench. A Benchmark for Evaluating High Definition Digital Video Applications , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.
[39] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[40] Stamatis Vassiliadis,et al. The CSI multimedia architecture , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[41] Chih-Wei Liu,et al. Multithreaded coprocessor interface for multi-core multimedia SoC , 2008, 2008 Asia and South Pacific Design Automation Conference.
[42] Roberto Giorgi,et al. Introducing Hardware TLP Support in the Cell Processor , 2009, 2009 International Conference on Complex, Intelligent and Software Intensive Systems.
[43] Kurt Keutzer,et al. Efficient Parallelization of H.264 Decoding with Macro Block Level Scheduling , 2007, 2007 IEEE International Conference on Multimedia and Expo.
[44] Lizy Kurian John,et al. Cost-effective hardware acceleration of multimedia applications , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.
[45] Ruby B. Lee,et al. 64-bit and multimedia extensions in the PA-RISC 2.0 architecture , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.
[46] Andrew Wolfe,et al. Available parallelism in video applications , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.
[47] Hunter Scales,et al. AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.
[48] Javier D. Bruguera,et al. An FPGA architecture for CABAC decoding in manycore systems , 2008, 2008 International Conference on Application-Specific Systems, Architectures and Processors.
[49] Dongrui Fan,et al. Architectural support for cilk computations on many-core architectures , 2009, PPoPP '09.
[50] Ben H. H. Juurlink,et al. Evaluation of parallel H.264 decoding strategies for the Cell Broadband Engine , 2010, ICS '10.
[51] Jiun-In Guo,et al. An efficient 2-D DCT/IDCT core design using cyclic convolution and adder-based realization , 2004, IEEE Transactions on Circuits and Systems for Video Technology.
[52] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.
[53] Ben H. H. Juurlink,et al. Analysis of video filtering on the cell processor , 2008, 2008 IEEE International Symposium on Circuits and Systems.
[54] Yuan Shi. Reevaluating Amdahl's Law and Gustafson's Law , 1996 .
[55] I. Daubechies,et al. Factoring wavelet transforms into lifting steps , 1998 .
[56] K. R. Rao,et al. An overview of H.264/MPEG-4 Part 10 , 2003, Proceedings EC-VIP-MC 2003. 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (IEEE Cat. No.03EX667).
[57] J. O. Eklundh,et al. A Fast Computer Method for Matrix Transposing , 1972, IEEE Transactions on Computers.
[58] Zhigang Cao,et al. New cost-effective VLSI implementation of a 2-D discrete cosine transform and its inverse , 2004, IEEE Transactions on Circuits and Systems for Video Technology.
[59] Stéphane Mallat,et al. A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..
[60] Zhuo Zhao,et al. Data partition for wavefront parallelization of H.264 video encoder , 2006, 2006 IEEE International Symposium on Circuits and Systems.
[61] Jong-Myon Kim,et al. Quantized color instruction set for media-on-demand applications , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).
[62] Ingrid Verbauwhede,et al. Low power DSP's for wireless communications , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).
[63] David A. Patterson,et al. Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .
[64] Lifeng Sun,et al. Spatial and Temporal Data Parallelization of Multi-view Video Encoding Algorithm , 2007, 2007 IEEE 9th Workshop on Multimedia Signal Processing.
[65] Ajay Luthra,et al. The H.264/AVC Advanced Video Coding standard: overview and introduction to the fidelity range extensions , 2004, SPIE Optics + Photonics.
[66] Ben H. H. Juurlink,et al. Extending the Cell SPE with Energy Efficient Branch Prediction , 2010, Euro-Par.
[67] Andrei Sergeevich Terechko,et al. A Multithreaded Multicore System for Embedded Media Processing , 2011, Trans. High Perform. Embed. Archit. Compil..
[68] Rainer Leupers,et al. Task management in MPSoCs: An ASIP approach , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.
[69] Mateo Valero,et al. Performance evaluation of macroblock-level parallelization of H.264 decoding on a cc-NUMA multiprocessor architecture , 2009 .
[70] K. Ohmori,et al. A 60 MHz 240 mW MPEG-4 video-phone LSI with 16 Mb embedded DRAM , 2000, 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.00CH37056).
[71] K. Suzuki,et al. A 2000-MOPS embedded RISC processor with a Rambus DRAM controller , 1999 .
[72] Jeffrey Scott Vitter. Implementations for coalesced hashing , 1982, CACM.
[73] Benjamin C. Lee,et al. Effects of pipeline complexity on SMT/CMP power-performance efficiency , 2005 .
[74] Henrique S. Malvar,et al. Low-complexity transform and quantization with 16-bit arithmetic for H.26L , 2002, Proceedings. International Conference on Image Processing.
[75] H. Takata,et al. The D30V/MPEG multimedia processor , 1999, IEEE Micro.
[76] William J. Dally,et al. Imagine: Media Processing with Streams , 2001, IEEE Micro.
[77] Paraskevas Evripidou,et al. Programming Abstractions and Toolchain for Dataflow Multithreading Architectures , 2009, 2009 Eighth International Symposium on Parallel and Distributed Computing.
[78] Hyunseok Lee,et al. SODA: A High-Performance DSP Architecture for Software-Defined Radio , 2007, IEEE Micro.
[79] B. Flachs,et al. The microarchitecture of the synergistic processor for a cell processor , 2006, IEEE Journal of Solid-State Circuits.
[80] Henk Corporaal,et al. Automatic detection of recurring operation patterns , 1999, Proceedings of the Seventh International Workshop on Hardware/Software Codesign (CODES'99) (IEEE Cat. No.99TH8450).
[81] Soontorn Oraintara,et al. Complexity comparison of fast block-matching motion estimation algorithms , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[82] Wonyong Sung,et al. H.264 decoder optimization exploiting SIMD instructions , 2004, The 2004 IEEE Asia-Pacific Conference on Circuits and Systems, 2004. Proceedings..
[83] Seung-Min Lee,et al. High-speed and low-power real-time programmable video multi-processor for MPEG-2 multimedia chip on 0.6 /spl mu/m TLM CMOS technology , 1999, Proceedings of the ASP-DAC '99 Asia and South Pacific Design Automation Conference 1999 (Cat. No.99EX198).
[84] John L. Gustafson,et al. Reevaluating Amdahl's law , 1988, CACM.
[85] N. O V E M B,et al. Digital, MIPS Add Multimedia Extensions: 11/18/96 , 1996 .
[86] Wai-Yip Chan,et al. Performance improvement of the H.264/AVC deblocking filter using SIMD instructions , 2006, 2006 IEEE International Symposium on Circuits and Systems.
[87] Henrique S. Malvar,et al. Low-complexity transform and quantization in H.264/AVC , 2003, IEEE Trans. Circuits Syst. Video Technol..
[88] Alan Jay Smith,et al. Measuring the Performance of Multimedia Instruction Sets , 2002, IEEE Trans. Computers.
[89] Stamatis Vassiliadis,et al. Implementing the 2-D Wavelet Transform on SIMD-Enhanced General-Purpose Processors , 2008, IEEE Transactions on Multimedia.
[90] Mathias Wien,et al. Variable block-size transforms for H.264/AVC , 2003, IEEE Trans. Circuits Syst. Video Technol..
[91] Yang Song,et al. A Hardware Architecture of CABAC Encoding and Decoding with Dynamic Pipeline for H.264/AVC , 2008, J. Signal Process. Syst..
[92] Manuel P. Malumbres,et al. Hierarchical Parallelization of an H.264/AVC Video Encoder , 2006, International Symposium on Parallel Computing in Electrical Engineering (PARELEC'06).
[93] E. Salami,et al. A performance characterization of high definition digital video decoding using H.264/AVC , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..
[94] H. Peter Hofstee,et al. Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.
[95] Mateo Valero,et al. Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.
[96] Hsien-Hsin S. Lee,et al. Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era , 2008, Computer.
[97] Youn-Long Lin,et al. A hardware accelerator for context-based adaptive binary arithmetic decoding in H.264/AVC , 2005, 2005 IEEE International Symposium on Circuits and Systems.
[98] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[99] Yen-Kuang Chen,et al. ALP: Efficient support for all levels of parallelism for complex media applications , 2007, TACO.
[100] Andrei Sergeevich Terechko,et al. A Hardware Task Scheduler for Embedded Video Processing , 2008, HiPEAC.
[101] Stamatis Vassiliadis,et al. Performance Impact of Misaligned Accesses in SIMD Extensions , 2006 .
[102] Stamatis Vassiliadis,et al. An 8x8 IDCT Implementation on an FPGA-Augmented TriMedia , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).
[103] Amit Gulati,et al. Efficient mapping of the H.264 encoding algorithm onto multiprocessor DSPs , 2005, IS&T/SPIE Electronic Imaging.
[104] Sergio Bampi,et al. A Pipelined 8x8 2-D Forward DCT Hardware Architecture for H.264/AVC High Profile Encoder , 2007, PSIVT.
[105] Peter Pirsch,et al. Instruction Set Extensions for MPEG-4 Video , 1999, J. VLSI Signal Process..
[106] Michael Roitzsch. Slice-balancing H.264 video encoding for improved scalability of multicore decoding , 2007, EMSOFT '07.
[107] Mateo Valero,et al. Scalability of Macroblock-level Parallelism for H.264 Decoding , 2009, 2009 15th International Conference on Parallel and Distributed Systems.
[108] Vladimir M. Pentkovski,et al. Implementing Streaming SIMD Extensions on the Pentium III Processor , 2000, IEEE Micro.
[109] Klaus Schöffmann,et al. An Evaluation of Parallelization Concepts for Baseline-Profile Compliant H.264/AVC Decoders , 2007, Euro-Par.
[110] Christoforos E. Kozyrakis,et al. Scalable Vector Processors for Embedded Systems , 2003, IEEE Micro.
[111] Mateo Valero,et al. A Highly Scalable Parallel Implementation of H.264 , 2011, Trans. High Perform. Embed. Archit. Compil..