Computer Engineering and Technology

Cache memory system with a die-stacking DRAM L3 cache is a promising answer to break the Memory Wall and has a positive effect on performance. In order to further optimize the existing memory system, in this paper, a 3D DRAM as L3 Cache is modeled and analyzed based on DRAMSim2 simulator. In order to use an on-die DRAM as cache, tags and data are combined in one row in the DRAM, meanwhile, utilize the 3D DRAM with wider bus width and denser capacity. The cache memory modeling platform is evaluated by running traces which simulate the access behavior of core from spec2000 that generated by gem5. With DRAM L3 cache, all the test traces experience an improvement of performance. Read operation has an average speed-up of 1.82× over the baseline memory system, while write operation is 6.38×. The improvement of throughput in 3D DRAM cache compared to baseline system can reach to 1.45×’s speedup.

[1]  Lajos Hanzo,et al.  Semi-Blind Joint Channel Estimation and Data Detection for Space-Time Shift Keying Systems , 2010, IEEE Signal Processing Letters.

[2]  David Tse,et al.  Fundamentals of Wireless Communication , 2005 .

[3]  Carlos H. Llanos,et al.  A suitable FPGA implementation of floating-point matrix inversion based on Gauss-Jordan elimination , 2011, 2011 VII Southern Conference on Programmable Logic (SPL).

[4]  Kuan-Ching Li,et al.  Parallel Matrix Transposition and Vector Multiplication Using OpenMP , 2013 .

[5]  Habib Hamam,et al.  FPGA implementation of floating-point complex matrix inversion based on GAUSS-JORDAN elimination , 2013, 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[6]  Joseph R. Cavallaro,et al.  Implementation trade-offs for linear detection in large-scale MIMO systems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Harald Haas,et al.  Channel estimation for spatial modulation , 2013, 2013 IEEE 24th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC).

[8]  Emil Björnson,et al.  Linear Precoding Based on Polynomial Expansion: Large-Scale Multi-Cell MIMO Systems , 2013, IEEE Journal of Selected Topics in Signal Processing.

[9]  Carlos H. Llanos,et al.  FPGA implementation of large-scale matrix inversion using single, double and custom floating-point precision , 2012, 2012 VIII Southern Conference on Programmable Logic.

[10]  Ali Ghrayeb,et al.  Compressive sensing-based channel estimation for massive multiuser MIMO systems , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[11]  Ralf R. Müller,et al.  Blind Pilot Decontamination , 2013, IEEE Journal of Selected Topics in Signal Processing.

[12]  Johan Eilert,et al.  Efficient Complex Matrix Inversion for MIMO Software Defined Radio , 2007, 2007 IEEE International Symposium on Circuits and Systems.

[13]  Mérouane Debbah,et al.  Asymptotic moments for interference mitigation in correlated fading channels , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[14]  Yuxing Tang,et al.  A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA , 2009, APPT.

[15]  Magnus Sandell,et al.  Singular value decomposition using an array of CORDIC processors , 2014, Signal Process..

[16]  Jarmo Takala,et al.  Direct versus iterative methods for fixed-point implementation of matrix inversion , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[17]  David Gesbert,et al.  A Coordinated Approach to Channel Estimation in Large-Scale Multiple-Antenna Systems , 2012, IEEE Journal on Selected Areas in Communications.

[18]  Fred G. Gustavson,et al.  Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition , 1978, TOMS.

[19]  Vincent K. N. Lau,et al.  Distributed Compressive CSIT Estimation and Feedback for FDD Multi-User Massive MIMO Systems , 2014, IEEE Transactions on Signal Processing.

[20]  Teng Joon Lim,et al.  Simplified polynomial-expansion linear detectors for DS-CDMA systems , 1998 .

[21]  Stamatis Vassiliadis,et al.  Sparse matrix transpose unit , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[22]  Joseph R. Cavallaro,et al.  Approximate matrix inversion for high-throughput data detection in the large-scale MIMO uplink , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[23]  Francisco Vázquez,et al.  A new approach for sparse matrix vector product on NVIDIA GPUs , 2011, Concurr. Comput. Pract. Exp..

[24]  Gregory H. Bauer,et al.  Optimizing matrix transposes using a POWER7 cache model and explicit prefetching , 2012, PERV.

[25]  Yong Dou,et al.  Designing Parallel Sparse Matrix Transposition Algorithm Using ELLPACK-R for GPUs , 2015 .

[26]  E.J. Candes Compressive Sampling , 2022 .

[27]  Clayton Shepard,et al.  ArgosV2: a flexible many-antenna research platform , 2013, MobiCom.

[28]  Chenyang Yang,et al.  Low complexity channel estimation in TDD coordinated multi-point transmission systems , 2013, 2013 IEEE Wireless Communications and Networking Conference (WCNC).

[29]  Yoshikazu Miyanaga,et al.  Development of an ASIP-based singular value decomposition processor in SVD-MIMO systems , 2011, 2011 International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS).

[30]  Christophe Bobda,et al.  Efficient Implementation of the Singular Value Decomposition on a Reconfigurable System , 2003, FPL.

[31]  Poras T. Balsara,et al.  VLSI Architecture for Matrix Inversion using Modified Gram-Schmidt based QR Decomposition , 2007, 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID'07).

[32]  R. Brent,et al.  Computation of the Singular Value Decomposition Using Mesh-Connected Processors , 1983 .

[33]  Joseph R. Cavallaro,et al.  Architectures For A Cordic SVD Processor , 1986, Optics & Photonics.

[34]  Arturo Garcia-Perez,et al.  Reconfigurable FPGA-Based Unit for Singular Value Decomposition of Large m x n Matrices , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.

[35]  Erik G. Larsson,et al.  EVD-based channel estimation in multicell multiuser MIMO systems with very large antenna arrays , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Lajos Hanzo,et al.  Reduced-Complexity Near-Capacity Joint Channel Estimation and Three-Stage Turbo Detection for Coherent Space-Time Shift Keying , 2013, IEEE Transactions on Communications.

[37]  Stanislav G. Sedukhin,et al.  Matrix Inversion on the Cell/B.E. Processor , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[38]  Guangming Shi,et al.  Efficient matrix inversion based on VLIW architecture , 2014 .

[39]  Pei-Yun Tsai,et al.  Efficient Implementation of QR Decomposition for Gigabit MIMO-OFDM Systems , 2011, IEEE Transactions on Circuits and Systems I: Regular Papers.

[40]  Sriram Krishnamoorthy,et al.  Efficient parallel out-of-core matrix transposition , 2004, Int. J. High Perform. Comput. Netw..

[41]  Jim Esch Spatial Modulation for Generalized MIMO: Challenges, Opportunities, and Implementation , 2014, Proc. IEEE.

[42]  Sheng Chen,et al.  Semi-Blind Adaptive Space-Time Shift Keying Systems Based on Iterative Channel Estimation and Data Detection , 2011, 2011 IEEE 73rd Vehicular Technology Conference (VTC Spring).

[43]  Iñaki Bildosola,et al.  Adaptive scalable SVD unit for fast processing of large LSE problems , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[44]  Lei Ma,et al.  QR Decomposition-Based Matrix Inversion for High Performance Embedded MIMO Receivers , 2011, IEEE Transactions on Signal Processing.