ViT-P: Classification of Genitourinary Syndrome of Menopause From OCT Images Based on Vision Transformer Models

Genitourinary syndrome of menopause (GSM) is a disease caused by a physiological decline in estrogen levels, and it can negatively affect a woman’s overall health and quality of life in terms of sexual function. Real-time optical biopsy images can now be obtained with optical coherence tomography (OCT) systems. In this study, we introduce vision transformer (ViT) to the field of medical OCT images for the first time and propose a deep learning-based approach for GSM lesion screening. Specifically, we first build a GSM dataset to train and evaluate the experimental model performance. The study aims to propose a method that combines null convolution with a deep convolutional adversarial generative network classifier to generate the samples needed for training to alleviate the hindrance of such problems, in response to certain practical problems, such as category imbalance that occur during data collection. Next, the experiments present ViT PLUS (ViT-P) for the vaginal OCT image classification task used, which effectively improves the shortcomings of ViT in extracting Patch Embedding using a multibranch convolutional neural network combined with a channel attention mechanism. The clinical images acquired by the OCT device are then automatically classified on the basis of the OCT device to reduce the medical workload of gynecologists. Experimental results show that the ViT-P model outperforms the CNN model and ViT for case screening in the GSM and UCSD datasets, and the accuracy can reach 99.9% and 99.69%, respectively.