Due to the large amount of chemical substances on the market, fast and reproducible screening is essential to prioritize chemicals for further evaluation according to highest concern. We here evaluate the performance of structural similarity models that are developed to identify potential substances of very high concern (SVHC) based on structural similarity to known SVHCs. These models were developed following a systematic analysis of the performance of 112 different similarity measures for varying SVHC-subgroups. The final models consist of the best combinations of fingerprint, similarity coefficient and similarity threshold, and suggested a high predictive performance (≥80%) on an internal dataset consisting of SVHC and non-SVHC substances. However, the application performance on an external dataset was not evaluated. Here, we evaluated the application performance of the developed similarity models with a 'pseudo-external assessment' on a set of substances (n = 60-100 for the varying SVHC-subgroups) that were putatively assessed as SVHC or non-SVHC based upon consensus scoring using expert elicitations (n = 30 experts). Expert scores were direct evaluations based on structural similarity to the most similar SVHCs according to the similarity models, and did not consider an extensive evaluation of available data. The use of expert opinions is particularly suitable as this is exactly the intended purpose of the chemical similarity models: a quick, reproducible and automated screening tool that mimics the expert judgement that is frequently applied in various screening applications. In addition, model predictions were analyzed via qualitative approaches and discussed via specific examples, to identify the model's strengths and limitations. The results indicate a good statistical performance for carcinogenic, mutagenic or reprotoxic (CMR) and endocrine disrupting (ED) substances, whereas a moderate performance was observed for (very) persistent, (very) bioaccumulative and toxic (PBT/vPvB) substances when compared to expert opinions. For the PBT/vPvB model, particularly false positive substances were identified, indicating the necessity of outcome interpretation. The developed similarity models are made available as a freely-accessible online tool. In general, the structural similarity models showed great potential for screening and prioritization purposes. The models proved to be effective in identifying groups of substances of potential concern, and could be used to identify follow-up directions for substances of potential concern.