Abstract For the last two decades, video shot segmentation has been a widely researched topic in the field of content-based video analysis (CBVA). However, over the course of time, researchers have aimed to improve upon the existing methods of shot segmentation in order to gain accuracy. Video shot segmentation or shot boundary analysis is a basic and vital step in CBVA, since any error incurred in this step reduces the precision of the other steps. The shot segmentation problem assumes greater proportions when detection is preferred in real time. A spatiotemporal fuzzy hostility index (STFHI) is proposed in this work which is used for edge detection of objects occurring in the frames of a video. The edges present in the frames are treated as features. Correlation between these edge-detected frames is used as a similarity measure. In a real-time scenario, the incoming images are processed and the similarities are computed for successive frames of the video. These values are assumed to be normally distributed. The gradients of these correlation values are taken to be members of a vague set. In order to obtain a threshold after defuzzification, the true and false memberships of the elements are computed using a novel approach. The threshold is updated as new frames are buffered in and is referred to as the vague adaptive threshold (VAT). The shot boundaries are then detected based on the VAT. The VAT for detecting the shot boundaries is determined by using the three-sigma rule on the defuzzified membership values. The effectiveness of the real-time video segmentation method is established by an experimental evaluation on a heterogeneous test set, comprising videos with diverse characteristics. The test set consists of videos from sports, movie songs, music albums, and documentaries. The proposed method is seen to achieve an average F1 score of 0.992 over the test set consisting of 15 videos. Videos from the benchmark TRECVID 2001 are selected for comparison with other state-of-the-art-methods. The proposed method achieves very high precision and recall, with an average F1 score of 0.939 on the videos chosen from the TRECVID 2001 dataset. This is a substantial improvement over the other existing methods.