FlexCore: Massively Parallel and Flexible Processing for Large MIMO Access Points

Large MIMO base stations remain among wireless network designers’ best tools for increasing wireless throughput while serving many clients, but current system designs, sacrifice throughput with simple linear MIMO detection algorithms. Higher-performance detection techniques are known, but remain off the table because these systems parallelize their computation at the level of a whole OFDM subcarrier, sufficing only for the lessdemanding linear detection approaches they opt for. This paper presents FlexCore, the first computational architecture capable of parallelizing the detection of large numbers of mutually-interfering information streams at a granularity below individual OFDM subcarriers, in a nearly-embarrassingly parallel manner while utilizing any number of available processing elements. For 12 clients sending 64-QAM symbols to a 12-antenna base station, our WARP testbed evaluation shows similar network throughput to the state-of-the-art while using an order of magnitude fewer processing elements. For the same scenario, our combined WARP-GPU testbed evaluation demonstrates a 19× computational speedup, with 97% increased energy efficiency when compared with the state of the art. Finally, for the same scenario, an FPGAbased comparison between FlexCore and the state of the art shows that FlexCore can achieve up to 96% better energy efficiency, and can offer up to 32× the processing throughput.

[1]  Loïc Brunel,et al.  Soft-input soft-output lattice sphere decoder for linear channels , 2003, GLOBECOM '03. IEEE Global Telecommunications Conference (IEEE Cat. No.03CH37489).

[2]  Björn E. Ottersten,et al.  On the complexity of sphere decoding in digital communications , 2005, IEEE Transactions on Signal Processing.

[3]  P. Glenn Gulak,et al.  Scalable VLSI architecture for K-best lattice decoders , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[4]  B. Sundar Rajan,et al.  Near-ML Signal Detection in Large-Dimension Linear Vector Channels Using Reactive Tabu Search , 2009, ArXiv.

[5]  Emanuele Viterbo,et al.  A universal lattice code decoder for fading channels , 1999, IEEE Trans. Inf. Theory.

[6]  Antonio M. Vidal,et al.  Fully Parallel GPU Implementation of a Fixed-Complexity Soft-Output MIMO Detector , 2012, IEEE Transactions on Vehicular Technology.

[7]  Andreas Peter Burg,et al.  K-best MIMO detection VLSI architectures achieving up to 424 Mbps , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[8]  Andreas Peter Burg,et al.  Area- and throughput-optimized VLSI architecture of sphere decoding , 2010, 2010 18th IEEE/IFIP International Conference on VLSI and System-on-Chip.

[9]  Zhan Guo,et al.  Algorithm and implementation of the K-best sphere decoding for MIMO detection , 2006, IEEE Journal on Selected Areas in Communications.

[10]  T. Kailath,et al.  Iterative decoding for MIMO channels via modified sphere decoding , 2004, IEEE Transactions on Wireless Communications.

[11]  John S. Thompson,et al.  FPGA Design Considerations in the Implementation of a Fixed-Throughput Sphere Decoder for MIMO Systems , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[12]  Mohamed M. Abdallah,et al.  Efficient FPGA Implementation of MIMO Decoder for Mobile WiMAX System , 2009, 2009 IEEE International Conference on Communications.

[13]  Guido Masera,et al.  A Novel VLSI Architecture of Fixed-Complexity Sphere Decoder , 2010, 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools.

[14]  Qing Yang,et al.  BigStation: enabling scalable real-time signal processingin large mu-mimo systems , 2013, SIGCOMM.

[15]  Edward A. Lee,et al.  Digital communication (3. ed.) , 2003 .

[16]  B. Sundar Rajan,et al.  A Low-Complexity Detector for Large MIMO Systems and Multicarrier CDMA Systems , 2008, IEEE Journal on Selected Areas in Communications.

[17]  Thomas L. Marzetta,et al.  Argos: practical many-antenna base stations , 2012, Mobicom '12.

[18]  Antonio M. Vidal,et al.  MIMOPack: a high-performance computing library for MIMO communication systems , 2014, The Journal of Supercomputing.

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Jing Wang,et al.  High throughput MIMO-OFDM detection with graphics processing units , 2012, 2012 IEEE International Conference on Computer Science and Automation Engineering (CSAE).

[21]  Dejan Markovic,et al.  A 2.89mW 50GOPS 16×16 16-core MIMO sphere decoder in 90nm CMOS , 2009, 2009 Proceedings of ESSCIRC.

[22]  K. Kammeyer,et al.  Efficient algorithm for decoding layered space-time codes , 2001 .

[23]  Babak Hassibi,et al.  On the sphere-decoding algorithm II. Generalizations, second-order statistics, and applications to communications , 2005, IEEE Transactions on Signal Processing.

[24]  Ahmed M. Eltawil,et al.  Design and Implementation of a Sort-Free K-Best Sphere Decoder , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Babak Hassibi,et al.  On the sphere-decoding algorithm I. Expected complexity , 2005, IEEE Transactions on Signal Processing.

[26]  Juan Zhou,et al.  Geosphere: consistently turning MIMO capacity into throughput , 2014, SIGCOMM.

[27]  Wei Wang,et al.  SAM: enabling practical spatial multiple access in wireless LAN , 2009, MobiCom '09.

[28]  Wai Ho Mow,et al.  Complex Lattice Reduction Algorithm for Low-Complexity Full-Diversity MIMO Detection , 2009, IEEE Transactions on Signal Processing.

[29]  Dejan Markovic,et al.  A Flexible DSP Architecture for MIMO Sphere Decoding , 2009, IEEE Transactions on Circuits and Systems I: Regular Papers.

[30]  Joseph R. Cavallaro,et al.  Flex-Sphere: An FPGA Configurable Sort-Free Sphere Detector For Multi-user MIMO Wireless Systems , 2008 .

[31]  Andreas Peter Burg,et al.  Reduced-complexity mimo detector with close-to ml error rate performance , 2007, GLSVLSI '07.

[32]  Wai Ho Mow,et al.  A VLSI architecture of a K-best lattice decoding algorithm for MIMO channels , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[33]  Tong Zhang,et al.  Relaxed $K$ -Best MIMO Signal Detector Design and VLSI Implementation , 2007, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[34]  Reinaldo A. Valenzuela,et al.  V-BLAST: an architecture for realizing very high data rates over the rich-scattering wireless channel , 1998, 1998 URSI International Symposium on Signals, Systems, and Electronics. Conference Proceedings (Cat. No.98EX167).

[35]  B. Sundar Rajan,et al.  A Low-complexity near-ML performance achieving algorithm for large MIMO detection , 2008, 2008 IEEE International Symposium on Information Theory.

[36]  Rachel Courtland,et al.  Transistors could stop shrinking in 2021 , 2016 .

[37]  Francisco-Jose Martínez-Zaldívar,et al.  New Parallel Sphere Detector Algorithm Providing High-Throughput for Optimal MIMO Detection , 2013, ICCS.

[38]  Doug Amos,et al.  FPGA-Based Prototyping Methodology Manual: Best Practices in Design-For-Prototyping , 2011 .

[39]  E. Biglieri,et al.  A universal decoding algorithm for lattice codes , 1993 .

[40]  Joseph R. Cavallaro,et al.  Implementation of a High Throughput Soft MIMO Detector on GPU , 2011, J. Signal Process. Syst..

[41]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[42]  A. Burg,et al.  VLSI implementation of MIMO detection using the sphere decoding algorithm , 2005, IEEE Journal of Solid-State Circuits.

[43]  John S. Thompson,et al.  Fixing the Complexity of the Sphere Decoder for MIMO Detection , 2008, IEEE Transactions on Wireless Communications.

[44]  Markus Rupp,et al.  Boosting sphere decoding speed through Graphic Processing Units , 2010, 2010 European Wireless Conference (EW).

[45]  Markku J. Juntti,et al.  A GPU implementation for two MIMO-OFDM detectors , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[46]  Georgios Georgis,et al.  Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution , 2016, Journal of Real-Time Image Processing.

[47]  Harry Leib,et al.  GPU acceleration for fixed complexity sphere decoder in large MIMO uplink systems , 2015, 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE).

[48]  Edward W. Knightly,et al.  Opportunistic Channel Estimation for Implicit 802.11af MU-MIMO , 2016, 2016 28th International Teletraffic Congress (ITC 28).

[49]  P. Glenn Gulak,et al.  A pipelined scalable high-throughput implementation of a near-ML K-best complex lattice decoder , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[50]  Zhongfeng Wang,et al.  Improved k-best sphere decoding algorithms for MIMO systems , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[51]  Joseph R. Cavallaro,et al.  A GPU implementation of a real-time MIMO detector , 2009, 2009 IEEE Workshop on Signal Processing Systems.

[52]  Dejan Markovic,et al.  A 5.8mW 3GPP-LTE compliant 8×8 MIMO sphere decoder chip with soft-outputs , 2010, 2010 Symposium on VLSI Circuits.