Field-programmable gate array (FPGA) is growing as a new platform for accelerating heavy computational tasks such as machine learning and cryptography. To making FPGA acceleration as easy as conventional one by graphics processing units, FPGA vendors are providing high-level synthesis tools, such as Xilinx’s SDAccel, that synthesize a circuit from a program written by languages such as C, C++, and OpenCL. The benefit of high-level synthesis, however, comes with the stronger abstraction that makes optimization challenging compared to conventional development using a hardware description language, and there is only a limited publications on how to optimize the performance of a synthesized circuit. In this paper, we take the authenticated encryption algorithm AES-GCM as an example, and show the case study on how to optimize the performance: starting from a naive baseline implementation and achieving a fully pipelined implementation that accepts 128-bit message block every cycle. The optimized implementation achieves 392.173 MB/s, which is 50 times faster than the baseline implementation.