A Power Efficient Linear Equation Solver on A Multi-Fpgaaccelerator
暂无分享,去创建一个
Aravind Dasu | Arvind Sudarsanam | Seth Young | Thomas Hauser | T. Hauser | A. Dasu | A. Sudarsanam | Seth Young
[1] C. Siva Ram Murthy,et al. A New Parallel Algorithm for Solving Sparse Linear Systems , 1995, ISCAS.
[2] R. Ernst,et al. A mixed QoS SDRAM controller for FPGA-based high-end image processing , 2003, 2003 IEEE Workshop on Signal Processing Systems (IEEE Cat. No.03TH8682).
[3] Viktor K. Prasanna,et al. A high-performance and energy-efficient architecture for floating-point based LU decomposition on FPGAs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[4] Yu-Fai Fung,et al. A PC based parallel LU decomposition algorithm for sparse matrices , 2003, 2003 IEEE Pacific Rim Conference on Communications Computers and Signal Processing (PACRIM 2003) (Cat. No.03CH37490).
[5] A. George,et al. Computational Density of Fixed and Reconfigurable Multi-Core Devices for Application Acceleration , 2008 .
[6] Viktor K. Prasanna,et al. Efficient Floating-point Based Block LU Decomposition on FPGAs , 2004, ERSA.
[7] Anjan Bose,et al. Parallel solution of large sparse matrix equations and parallel power flow , 1995 .
[8] W. Gropp,et al. Solution of dense systems of linear equations arising from integral-equation formulations , 1995 .
[9] Viktor Öwall,et al. Implementation of a scalable matrix inversion architecture for triangular matrices , 2003, 14th IEEE Proceedings on Personal, Indoor and Mobile Radio Communications, 2003. PIMRC 2003..
[10] Partha Pratim Pande,et al. Power efficiency in high performance computing , 2012 .
[11] Edusmildo Orozco,et al. Reconfigurable Computing. Accelerating Computation with Field-Programmable Gate Arrays , 2007, Scalable Comput. Pract. Exp..
[12] John Shalf,et al. Power efficiency in high performance computing , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[13] Ahmed El-Amawy. A Systolic Architecture for Fast Dense Matrix Inversion , 1989, IEEE Trans. Computers.
[14] Thomas Hauser,et al. Design of a Portable Cluster Supercomputer for Particle Image Velocimetry Data Processing , 2008, J. Aerosp. Comput. Inf. Commun..
[15] Sotirios G. Ziavras,et al. Performance optimization of an FPGA-based configurable multiprocessor for matrix operations , 2003, Proceedings. 2003 IEEE International Conference on Field-Programmable Technology (FPT) (IEEE Cat. No.03EX798).
[16] Viktor K. Prasanna,et al. Time and Energy Efficient Matrix Factorization Using FPGAs , 2003, FPL.
[17] Gadi Fibich,et al. Efficient Solution of A, x(k) = b(k) Using A−1 , 2007, J. Sci. Comput..
[18] Zhen Liu,et al. FPGA implementation of hierarchical memory architecture for network processors , 2004, Proceedings. 2004 IEEE International Conference on Field- Programmable Technology (IEEE Cat. No.04EX921).
[19] K. W. Chan. Parallel algorithms for direct solution of large sparse power system matrix equations , 2001 .
[20] Stephen M. Trimberger. Field-Programmable Gate Array Technology , 2007 .
[21] Pedro C. Diniz,et al. Synthesis and estimation of memory interfaces for FPGA-based reconfigurable computing engines , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..
[22] Viktor K. Prasanna,et al. Scalable hybrid designs for linear algebra on reconfigurable computing systems , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).
[23] Jack Dongarra,et al. LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.
[24] Volodymyr V. Kindratenko,et al. A case study in porting a production scientific supercomputing application to a reconfigurable computer , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[25] Aravind Dasu,et al. Performance of a LU decomposition on a multi-FPGA system compared to a low power commodity microprocessor system , 2007, Scalable Comput. Pract. Exp..
[26] Arvind Sudarsanam,et al. Multi-FPGA based High Performance LU Decomposition , 2006 .
[27] Sotirios G. Ziavras,et al. Parallel LU factorization of sparse matrices on FPGA‐based configurable computing engines , 2004, Concurr. Comput. Pract. Exp..
[28] Viktor K. Prasanna,et al. High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware , 2008, IEEE Transactions on Computers.
[29] Sotirios G. Ziavras,et al. Parallel LU factorization of sparse matrices on FPGA-based configurable computing engines: Research Articles , 2004 .
[30] Sadaf R. Alam,et al. Using FPGA Devices to Accelerate Biomolecular Simulations , 2007, Computer.
[31] Xin-Qing Sheng,et al. Implementation and experiments of a hybrid algorithm of the MLFMA-enhanced FE-BI method for open-region inhomogeneous electromagnetic problems , 2002 .
[32] Maya Gokhale,et al. Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays , 2005 .
[33] Sotirios G. Ziavras,et al. A configurable multiprocessor and dynamic load balancing for parallel LU factorization , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[34] R. Venkatesh,et al. Parallel matrix inversion techniques , 1996, Proceedings of 1996 IEEE Second International Conference on Algorithms and Architectures for Parallel Processing, ICA/sup 3/PP '96.
[35] James Demmel,et al. SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems , 2003, TOMS.
[36] Karl S. Hemmert,et al. Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.
[37] Aravind Dasu,et al. Memory support design for LU decomposition on the starbridge hyper-computer , 2006, 2006 IEEE International Conference on Field Programmable Technology.
[38] Åke Björck,et al. Numerical Methods , 2021, Markov Renewal and Piecewise Deterministic Processes.
[39] S. G. Kratzer. Massively parallel sparse LU factorization , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.