Temporal data series and logistic models reveal the dynamics of SARS-CoV-2 spike protein D614G variant in the COVID-19 pandemic
The COVID-19 pandemic is caused by the worldwide spread of the RNA virus SARS-CoV-2. Because of its mutational rate, wide geographical distribution, and host response variance this coronavirus is currently evolving into an array of strains with increasing genetic diversity. Most variants apparently have neutral effects for disease spread and symptom severity. However, in the viral spike protein, which is responsible for host cell attachment and invasion, the D614G variant, containing the amino acid substitution D to G in position 614, was suggested to increase viral infection capability. Here we propose a novel method to test the epidemiological impact of emergence of a new variant, by a combination of epidemiological curves (for new cases) and the temporal variation of relative frequencies of the variants through a logistic regression model. We applied our method to temporal distributions of SARS-CoV-2 D614 or G614, in two geographic areas: USA (East Coast versus West Coast) and Europe-Asia (East Countries versus West Countries). Our analysis shows that the D614G prevalence and the growth rates of COVID-19 epidemic data curves are correlated at the early stages and not correlated at the late stages, in both the USA and Europe-Asia scenarios. These results show that logistic models can reveal the potential selective advantage of D614G, which can explain, at least in part, the impact of this variant on COVID-19 epidemiology.