Computational experiments are an important method for carrying out the quantitative analysis of complex systems and play a major role in mapping the real world to the virtual world. However, the flexibility of computational experiments leads to arbitrary modeling processes and unconvincing results, which greatly hinder the large-scale application of this method. In this context, the verification of computational models has become an urgent problem in this field. Currently, model evaluation is still in its infancy and the existing evaluation methods are not mature enough. Thus, we took epidemic models as the research object and proposed a capability maturity evaluation framework for computational models of artificial society. The framework differs from previous assessment methods that focus on the validity of results, but instead provides a comprehensive evaluation from two perspectives: 1) evaluation of the model itself—by comparing the expectation with the final implementation, we can obtain whether the model meets the expectation and 2) comparison between different models—by evaluating the implementation process of each model and comparing the results, we can identify more mature models. The implementation of the model is evaluated from input, process, and output. Further, specific analyses and evaluations are conducted for several representative COVID-19 models to verify the validity of this evaluation framework. The results of the case study show that the proposed evaluation framework can help decision-makers identify more mature and referential models, and point out the directions where modelers can improve their models.