Entropy based analysis of SARS-CoV-2 spread in India using informative subtype markers

India became one of the most COVID-19 affected countries with more than 4 million infected cases and 71,000 deaths by September 2020. We studied the temporal dynamics and geographic distribution of SARS-CoV-2 subtypes in India. Moreover, we analysed the RGD motif and D614G mutation in the spike protein of SARS-CoV-2. We used a previously proposed viral subtyping method based upon informative subtype markeIndirs (ISMs). The ISMs were identified on the basis of information entropy using 94,515 genome sequences of SARS-CoV-2 available publicly at the Global Initiative on Sharing All Influenza Data (GISAID). We identified 11 distinct positions in the SARS-CoV-2 genomes for defining ISMs resulting in 768 unique ISMs. The most abundant ISM in India was transferred from European countries. In contrast, the second most abundant ISM in India was found to be transferred via Australia. Moreover, the eastern regions in India were infected by the ISM most abundant in China due to geographical linkage. Our analysis confirmed higher rates of new cases in the countries abundant with S-G614 strain compared to countries with abundant S-D614 strain. In India, overall S-G614 was most prevalent compared to S-D614, except a few regions including New Delhi, Bihar, and Rajasthan.