Reducing thread divergence in a GPU‐accelerated branch‐and‐bound algorithm
暂无分享,去创建一个
[1] Imen Chakroun,et al. Graphics processing unit‐accelerated bounding for branch‐and‐bound applied to a permutation problem using data access optimization , 2014, Concurr. Comput. Pract. Exp..
[2] S. M. Johnson,et al. Optimal two- and three-stage production schedules with setup times included , 1954 .
[3] Imen Chakroun,et al. Reducing Thread Divergence in GPU-Based B&B Applied to the Flow-Shop Problem , 2011, PPAM.
[4] Wen-mei W. Hwu,et al. Program optimization carving for GPU computing , 2008, J. Parallel Distributed Comput..
[5] Imen Chakroun,et al. An Adaptative Multi-GPU Based Branch-and-Bound. A Case Study: The Flow-Shop Scheduling Problem , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.
[6] Tianyi David Han,et al. Reducing branch divergence in GPU programs , 2011, GPGPU-4.
[7] Jack Dongarra,et al. Scientific Computing with Multicore and Accelerators , 2010, Chapman and Hall / CRC computational science series.
[8] Tor M. Aamodt,et al. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[9] Éric D. Taillard,et al. Benchmarks for basic scheduling problems , 1993 .
[10] Kevin Skadron,et al. Dynamic warp subdivision for integrated branch and memory divergence tolerance , 2010, ISCA.
[11] Inmaculada García,et al. Branch-and-Bound interval global optimization on shared memory multiprocessors , 2008, Optim. Methods Softw..
[12] B. J. Lageweg,et al. A General Bounding Scheme for the Permutation Flow-Shop Problem , 1978, Oper. Res..
[13] Ravi Sethi,et al. The Complexity of Flowshop and Jobshop Scheduling , 1976, Math. Oper. Res..
[14] Michael J. Quinn,et al. Analysis and Implementation of Branch-and Bound Algorithms on a Hypercube Multicomputer , 1990, IEEE Trans. Computers.
[15] El-Ghazali Talbi,et al. GPU Computing for Parallel Local Search Metaheuristic Algorithms , 2013, IEEE Transactions on Computers.
[16] Stefan Andersson-Engels,et al. Next-generation acceleration and code optimization for light transport in turbid media using GPUs , 2010, Biomedical optics express.
[17] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[18] Xipeng Shen,et al. Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping , 2010, ICS '10.
[19] El-Ghazali Talbi,et al. A Grid-enabled Branch and Bound Algorithm for Solving Challenging Combinatorial Optimization Problems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.