Energy consumption accounts for a significant portion of OPEX in 5G networks, which will be further increased when millimeter-wave (mmWave) base stations are deployed. To reduce the cost, renewable energy has been introduced to provide complementary power supply. Nevertheless, how to cost-efficiently utilize radio resources and renewable energy remains a challenge. In this paper, we jointly investigate the beamwidth management and resource allocation in mmWave backhaul HetNets with hybrid energy supply aiming to maximize long-term cost efficiency. This requires solving a mixed stochastic combinatorial optimization problem, characterizing the causal property of renewable energy harvesting process, stochastic channel conditions, and imperfect antenna alignment. Considering the complexity and dynamic environment, a learning system is established instead of using traditional optimization methods which typically experience exponential worst-case complexity and require complete information. Specifically, we propose a Learning-based Cost-efficient Resource Allocation (LCRA) algorithm that employs deep neural network to learn policies from experiences to ensure system performance while achieving cost-efficiency. To enhance the sampling efficiency and stability of the conventional deep reinforcement learning methods for our problem, an improved proximal policy optimization method is proposed to reuse the history samples. Specifically, a modified clip function is designed to deal with the hard restrictions between current and old policies. Furthermore, random network distillation is introduced to enhance the exploration capability. Numerical results reveal the convergence performance of the LCRA and the superiority in improving the cost efficiency in a hybrid energy powered mmWave backhaul HetNet compared with state-of-the-arts