Enhancing GPU Performance by Efficient Hardware-Based and Hybrid L1 Data Cache Bypassing