Restaurant review platforms, such as Yelp and Tripadvisor, routinely receives large numbers of photos in their review submissions. These photos provide significant value for users who seek to compare restaurants. In this context, the choice of cover images (i.e., representative photos of the restaurants) can greatly influence the level of user engagement on the platform. Unfortunately, selecting these images can be time consuming and often requires human intervention. At the same time, it is challenging to develop a systematic approach to assess the effectiveness of the selected images. In this paper, we collaborate with a large review platform in Asia to investigate this problem. We discuss two image selection approaches, namely crowd-based and AI-based systems. The AI-based system we propose learn complex latent image features, which is further enhanced by transfer learning to overcome the scarcity of labeled data. We collaborated with the platform to deploy our AI-based system through a randomized field experiment so as to carefully compare both systems. We find that the AI-based system outperforms the crowd-based counterpart and boosts user engagement by 12.43%-16.05% on average. We then conduct empirical analyses on observational data to identify the underlying mechanisms that drive the superior performance of the AI-based system. Finally, we infer from our findings that the AI-based system outperforms the crowd-based system for restaurants with a (i) longer tenure in the platform, (ii) more expensive offerings, (iii) lower star ratings, and (iv) limited numbers of user-generated photos.