pangolin lineage covid

Wong, A. C. P., Li, X., Lau, S. K. P. & Woo, P. C. Y. & Andersen, K. G. Pandemics: spend on surveillance, not prediction. PureBasic 53 13 constellations Public Python 42 17 The authors declare no competing interests. Novel Coronavirus (2019-nCoV) Situation Report 1, 21 January 2020 (World Health Organization, 2020). Visual exploration using TempEst39 indicates that there is no evidence for temporal signal in these datasets (Extended Data Fig. Mol. We showed that severe acute respiratory syndrome coronavirus 2 is probably a novel recombinant virus. Because the SARS-CoV-2 S protein has been implicated in past recombination events or possibly convergent evolution12, we specifically investigated several subregions of the Sproteinthe N-terminal domain of S1, the C-terminal domain of S1, the variable-loop region of the C-terminal domain, and S2. Sci. Evolutionary rate estimation can be profoundly affected by the presence of recombination50. COVID-19: Time to exonerate the pangolin from the transmission of SARS EPI_ISL_410538, EPI_ISL_410539, EPI_ISL_410540, EPI_ISL_410541 and EPI_ISL_410542) for the use of sequence data via the GISAID platform. PLoS Pathog. Methods Ecol. 17, 15781579 (1999). Trafficked pangolins can carry coronaviruses closely related to Background & objectives: Several phylogenetic classification systems have been devised to trace the viral lineages of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Because 3SEQ is the most statistically powerful of the mosaic methods61, we used it to identify the best-supported breakpoint history for each potential child (recombinant) sequence in the dataset. In addition, sequences NC_014470 (Bulgaria 2008), CoVZXC21, CoVZC45 and DQ412042 (Hubei-Yichang) needed to be removed to maintain a clean non-recombinant signal in A. from the European Research Council under the European Unions Horizon 2020 research and innovation programme (grant agreement no. The estimated divergence times for the pangolin virus most closely related to the SARS-CoV-2/RaTG13 lineage range from 1851 (17301958) to 1877 (17461986), indicating that these pangolin lineages were acquired from bat viruses divergent to those that gave rise to SARS-CoV-2. The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus . Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA, USA, Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, Leuven, Belgium, Department of Biological Sciences, Xian Jiaotong-Liverpool University, Suzhou, China, State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China, Department of Biology, University of Texas Arlington, Arlington, TX, USA, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK, MRC-University of Glasgow Centre for Virus Research, Glasgow, UK, You can also search for this author in Frontiers | Novel Highly Divergent SARS-CoV-2 Lineage With the Spike 206298/Z/17/Z. 23, 18911901 (2006). 725422-ReservoirDOCS). All sequence data analysed in this manuscript are available at https://github.com/plemey/SARSCoV2origins. ac, Root-to-tip (RtT) divergence as a function of sampling time for the three coronavirus evolutionary histories unfolding over different timescales (HCoV-OC43 (n=37; a) MERS (n=35; b) and SARS (n=69; c)). stand-alone pangolin work flows or Illumina DRAGEN COVID Lineage App (v3.5.5) following the default parameters. collected SARS-CoV data and assisted in analyses of SARS-CoV and SARS-CoV-2 data. P.L. Using these breakpoints, the longest putative non-recombining segment (nt1,88521,753) is 9.9kb long, and we call this region NRR2. Forni, D., Cagliani, R., Clerici, M. & Sironi, M. Molecular evolution of human coronavirus genomes. J. Virol. Unlike other viruses that have emerged in the past two decades, coronaviruses are highly recombinogenic14,15,16. Impact of SARS-CoV-2 Gamma lineage introduction and COVID-19 - Nature Wang, H., Pipes, L. & Nielsen, R. Synonymous mutations and the molecular evolution of SARS-Cov-2 origins. 27) receptors and its RBD being genetically closer to a pangolin virus than to RaTG13 (refs. Viruses 11, 174 (2019). Li, Q. et al. This new approach classifies the newly sequenced genome against all the diverse lineages present instead of a representative select sequences. While such models have recently been made available, we lack the information to calibrate the rate decline over time (for example, through internal node calibrations44). RegionsB and C span nt3,6259,150 and 9,26111,795, respectively. Nat. The sizes of the black internal node circles are proportional to the posterior node support. The origins we present in Fig. acknowledges support by the Research FoundationFlanders (Fonds voor Wetenschappelijk OnderzoekVlaanderen (nos. 53), this is inferred to have occurred before the divergence of RaTG13 and SARS-CoV-2 and thus should not influence our inferences. Bioinformatics 22, 26882690 (2006). Posterior rate distributions for MERS-CoV (far left) and HCoV-OC43 (far right) using BEAST on n=27 sequences spread over 4 years (MERS-CoV) and n=27 sequences spread over 49 years (HCoV-OC43). We aimed to analyze 3 naso-oropharyngeal swab samples collected between August and December 2021 to describe the amino acid changes present in the sequence reads that may have a role in the emergence of new . For the current pandemic, the novel pathogen identification component of outbreak response delivered on its promise, with viral identification and rapid genomic analysis providing a genome sequence and confirmation, within weeks, that the December 2019 outbreak first detected in Wuhan, China was caused by a coronavirus3. eLife 7, e31257 (2018). The variable-loop region in SARS-CoV-2 shows closer identity to the 2019 pangolin coronavirus sequence than to the RaTG13 bat virus, supported by phylogenetic inference (Fig. https://doi.org/10.1038/s41564-020-0771-4, DOI: https://doi.org/10.1038/s41564-020-0771-4. R. Soc. Robertson, D. nCoVs relationship to bat coronaviruses & recombination signals (no snakes) no evidence the 2019-nCoV lineage is recombinant. Software package for assigning SARS-CoV-2 genome sequences to global lineages. 84, 31343146 (2010). We extracted a similar number (n=35) of genomes from a MERS-CoV dataset analysed by Dudas et al.59 using the phylogenetic diversity analyser tool60 (v.0.5). Internet Explorer). PubMed Results and discussion Genomic surveillance has been a hallmark of the COVID-19 pandemic that, in contrast to other pandemics, achieves tracking of the virus evolution and spread worldwide almost in real-time ( 4 ). Of the countries that have contributed SARS-CoV-2 data, 30% had genomes of this lineage. Coronavirus: Pangolins may have spread the disease to humans Scientists trying to trace the ancestry of SARS-CoV-2, the virus responsible for COVID-19, have found the pangolin is unlikely to be the source of the virus responsible for the current pandemic. "This is an extremely interesting . 2, vew007 (2016). However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. We used TreeAnnotator to summarize posterior tree distributions and annotated the estimated values to a maximum clade credibility tree, which was visualized using FigTree. 1, vev016 (2015). The command line tool is open source software available under the GNU General Public License v3.0. Eight other BFRs <500nt were identified, and the regions were named BFRAJ in order of length. We thank A. Chan and A. Irving for helpful comments on the manuscript. 82, 18191826 (2008). T.L. PubMedGoogle Scholar. 4). The existing diversity and dynamic process of recombination amongst lineages in the bat reservoir demonstrate how difficult it will be to identify viruses with potential to cause major human outbreaks before they emerge. Lancet 395, 949950 (2020). Nature 583, 286289 (2020). Indeed, the rates reported by these studies are in line with the short-term SARS rates that we estimate (Fig. To begin characterizing any ancestral relationships for SARS-CoV-2, NRRs of the genome must be identified so that reliable phylogenetic reconstruction and dating can be performed. A new SARS-CoV-2 variant (B.1.1.523) capable of escaping immune protections As informative rate priors for the analysis of the sarbecovirus datasets, we used two different normal prior distributions: one with a mean of 0.00078 and s.d. Bioinformatics 30, 13121313 (2014). Adv. Virus Evol. In other words, a true breakpoint is less likely to be called as such (this is breakpoint-conservative), and thus the construction of a non-recombining region may contain true recombination breakpoints (with insufficient evidence to call them as such). Researchers in the UK had just set the scientific world . Five example sequences with incongruent phylogenetic positions in the two trees are indicated by dashed lines. PANGOLIN lineage database (15, 16) was used to analyze the frequency of lineages among countries. Viral metagenomics revealed Sendai virus and coronavirus infection of Malayan pangolins (Manis javanica). To avoid artefacts due to recombination, we focused on NRR1 and NRR2 and the recombination-masked alignment NRA3 to infer time-measured evolutionary histories. Biol. Grey tips correspond to bat viruses, green to pangolin, blue to SARS-CoV and red to SARS-CoV-2. Green boxplots show the TMRCA estimate for the RaTG13/SARS-CoV-2 lineage and its most closely related pangolin lineage (Guangdong 2019). Specifically, we used a combination of six methods implemented in v.5.5 of RDP5 (ref. The Artic Network receives funding from the Wellcome Trust through project no. If the latter still identified non-negligible recombination signal, we removed additional genomes that were identified as major contributors to the remaining signal. This leaves the insertion of polybasic. Lancet 383, 541548 (2013). Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins Download a free copy. Why Can't We Just Call BA.2 Omicron? - The Atlantic Pangolin relies on a novel algorithm called pangoLEARN. In case of DRAGEN COVID Lineage tool, the minimum accepted alignment score was set to 22 and results with scores <22 were discarded. Wang, L. et al. obtained the genome sequences of 10 SARS-CoV-2 virus strains through nanopore sequencing of nasopharyngeal swabs in Malta and analyzed the assembled genome with pangolin software, and the results showed that these virus strains were assigned to B.1 lineage, indicating that SARS-CoV-2 was widely spread in Europe (Biazzo et al., 2021). Evol. Coronavirus Disease 2019 (COVID-19) Situation Report 51 (World Health Organization, 2020). It is available as a command line tool and a web application. EPI_ISL_410721) and Beijing Institute of Microbiology and Epidemiology (W.-C. Cao, T.T.-Y.L., N. Jia, Y.-W. Zhang, J.-F. Jiang and B.-G. Jiang, nos. Extended Data Fig. Mol. B.W.P. Coronavirus: Pangolins found to carry related strains. J. Virol. SARS-CoV-2 Variant Classifications and Definitions In the presence of time-dependent rate variation, a widely observed phenomenon for viruses43,44,52, slower prior rates appear more appropriate for sarbecoviruses that currently encompass a sampling time range of about 18years. Boni, M. F., Posada, D. & Feldman, M. W. An exact nonparametric method for inferring mosaic structure in sequence triplets. We considered (1) the possibility that BFRs could be combined into larger non-recombinant regions and (2) the possibility of further recombination within each BFR. Bayesian evolutionary rate and divergence date estimates were shown to be consistent for these three approaches and for two different prior specifications of evolutionary rates based on HCoV-OC43 and MERS-CoV. Extended Data Fig. Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. As illustrated by the dashed arrows, these two posteriors motivate our specification of prior distributions with standard deviations inflated 10-fold (light color). Concurrent evidence also proposed pangolins as a potential intermediate species for SARS-CoV-2 emergence and suggested them as a potential reservoir species11,12,13. 3). 1c). The new paper finds that the genetic sequences of several strains of coronavirus found in pangolins were between 88.5 percent and 92.4 percent similar to those of the novel coronavirus. There are outstanding evolutionary questions on the recent emergence of human coronavirus SARS-CoV-2 including the role of reservoir species, the role of recombination and its time of divergence from animal viruses. These rate priors are subsequently used in the Bayesian inference of posterior rates for NRR1, NRR2, and NRA3 as indicated by the solid arrows. A., Lytras, S., Singer, J. PubMed Central The presence in pangolins of an RBD very similar to that of SARS-CoV-2 means that we can infer this was also probably in the virus that jumped to humans. Instead, similarity in codon usage metrics between the SARS-CoV-2 and eukaryotes analyzed was correlated with coding sequence GC content of the eukaryote, with more similar codon usage being identified in eukaryotes with low GC content similar to that of the coronavirus (b). All four of these breakpoints were also identified with the tree-based recombination detection method GARD35. This long divergence period suggests there are unsampled virus lineages circulating in horseshoe bats that have zoonotic potential due to the ancestral position of the human-adapted contact residues in the SARS-CoV-2 RBD. Mol. To employ phylogenetic dating methods, recombinant regions of a 68-genome sarbecovirus alignment were removed with three independent methods. Google Scholar. J. Virol. The most parsimonious explanation for these shared ACE2-specific residues is that they were present in the common ancestors of SARS-CoV-2, RaTG13 and Pangolin Guangdong 2019, and were lost through recombination in the lineage leading to RaTG13. All custom code used in the manuscript is available at https://github.com/plemey/SARSCoV2origins. Phylogenetic Assignment of Named Global Outbreak LINeages, The pangolin web app is maintained by the Centre for Genomic Pathogen Surveillance. When the first genome sequence of SARS-CoV-2, Wuhan-Hu-1, was released on 10January 2020 (GMT) on Virological.org by a consortium led by Zhang6, it enabled immediate analyses of its ancestry. Posada, D., Crandall, K. A. With horseshoe bats currently the most plausible origin of SARS-CoV-2, it is important to consider that sarbecoviruses circulate in a variety of horseshoe bat species with widely overlapping species ranges57. Syst. Genetics 176, 10351047 (2007). The inset represents divergence time estimates based on NRR1, NRR2 and NRA3. 25, 3548 (2017). In such cases, even moderate rate variation among long, deep phylogenetic branches will substantially impact expected root-to-tip divergences over a sampling time range that represents only a small fraction of the evolutionary history40. Emerg. 2). 4), but also by markedly different evolutionary rates. These means are based on the mean rates estimated for MERS-CoV and HCoV-OC43, respectively, while the standard deviations are set ten times higher than empirical values to allow greater prior uncertainty and avoid strong bias (Extended Data Fig. An initial genomic sequence analysis found that the reemergence of COVID-19 in New Zealand was caused by a SARS-CoV-2 from the (now ancestral) lineage B.1.1.1 of the pangolin nomenclature ( 17 ). The rate of genome generation is unprecedented, yet there is currently no coherent nor accepted scheme for naming the expanding . 5, 536544 (2020). Pangolins may have incubated the novel coronavirus, gene study shows We thank originating laboratories at South China Agricultural University (Y. Shen, L. Xiao and W. Chen; no. 26 March 2020. Patino-Galindo, J. It performs: K-mer based detection Map/align, variant calling Consensus sequence generation Lineage/clade analysis using Pangolin and NextClade Access the DRAGEN COVID Lineage App on BaseSpace Sequence Hub CoV-lineages GitHub Nevertheless, the viral population is largely spatially structured according to provinces in the south and southeast on one lineage, and provinces in the centre, east and northeast on another (Fig. 21, 15081514 (2015). Scientists defined the pangolin lineage of this variant to be B.1.1.523 and it was originally recognized as a variant under monitoring on July 14, 2021. This is evidence for numerous recombination events occurring in the evolutionary history of the sarbecoviruses22,33; specifying all past events in their correct temporal order34 is challenging and not shown here. Two other bat viruses (CoVZXC21 and CoVZC45) from Zhejiang Province fall on this lineage as recombinants of the RaTG13/SARS-CoV-2 lineage and the clade of Hong Kong bat viruses sampled between 2005 and 2007 (Fig. 2a. Nature 558, 180182 (2018). J. Virol. Boxes show 95% HPD credible intervals. On first examination this would suggest that that SARS-CoV-2 is a recombinant of an ancestor of Pangolin-2019 and RaTG13, as proposed by others11,22. Yu, H. et al. Open reading frames are shown above the breakpoint plot, with the variable-loop region indicated in the Sprotein. After removal of A1 and A4, we named the new region A. Means and 95% HPD intervals are 0.080 [0.0580.101] and 0.530 [0.3040.780] for the patristic distances between SARS-CoV-2 and RaTG13 (green) and 0.143 [0.1090.180] and 0.154 [0.0930.231] for the patristic distances between SARS-CoV-2 and Pangolin 2019 (orange). M.F.B., P.L. Google Scholar. Aside from RaTG13, Pangolin-CoV is the most closely related CoV to SARS-CoV-2. Combining regions A, B and C and removing the five named sequences gives us putative NRR1, as an alignment of 63sequences. Phylogenetic Assignment of Named Global Outbreak Lineages The latter was reconstructed using IQTREE66 v.2.0 under a general time-reversible (GTR) model with a discrete gamma distribution to model inter-site rate variation. 1. The genetic distances between SARS-CoV-2 and Pangolin Guangdong 2019 are consistent across all regions except the N-terminal domain, implying that a recombination event between these two sequences in this region is unlikely. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. 2). These datasets were subjected to the same recombination masking approach as NRA3 and were characterized by a strong temporal signal (Fig. It is clear from our analysis that viruses closely related to SARS-CoV-2 have been circulating in horseshoe bats for many decades. NTD, N-terminal domain; CTD, C-terminal domain. Bayesian evaluation of temporal signal in measurably evolving populations. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins Is the COVID-19 Outbreak the 'Revenge of the Pangolin'? | PETA All authors contributed to analyses and interpretations. Intraspecies diversity of SARS-like coronaviruses in Rhinolophus sinicus and its implications for the origin of SARS coronaviruses in humans. 2 Lack of root-to-tip temporal signal in SARS-CoV-2. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Lancet 395, 565574 (2020). 16, e1008421 (2020). Time-measured phylogenetic reconstruction was performed using a Bayesian approach implemented in BEAST42 v.1.10.4. c, Maximum likelihood phylogenetic trees rooted on a 2007 virus sampled in Kenya (BtKy72; root truncated from images), shown for five BFRs of the sarbecovirus alignment. Individual sequences such as RpShaanxi2011, Guangxi GX2013 and two sequences from Zhejiang Province (CoVZXC21/CoVZC45), as previously shown22,25, have strong phylogenetic recombination signals because they fall on different evolutionary lineages (with bootstrap support >80%) depending on what region of the genome is being examined. 4 we compare these divergence time estimates to those obtained using the MERS-CoV-centred rate priors for NRR1, NRR2 and NRA3. For weather, science, and COVID-19 . Accurate estimation of ages for deeper nodes would require adequate accommodation of time-dependent rate variation. Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. Microbiol. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide. Nature 579, 270273 (2020). The consistency of the posterior rates for the different prior means also implies that the data do contribute to the evolutionary rate estimate, despite the fact that a temporal signal was visually not apparent (Extended Data Fig. Bryant, D. & Moulton, V. Neighbor-Net: an agglomerative method for the construction of phylogenetic networks. 21, 255265 (2004). The SARS-CoV divergence times are somewhat earlier than dates previously estimated15 because previous estimates were obtained using a collection of SARS-CoV genomes from human and civet hosts (as well as a few closely related bat genomes), which implies that evolutionary rates were predominantly informed by the short-term SARS outbreak scale and probably biased upwards. The time-calibrated phylogeny represents a maximum clade credibility tree inferred for NRR1. 92, 433440 (2020). CAS J. Infect. Epidemiology, genetic recombination, and pathogenesis of coronaviruses. performed codon usage analysis. 36) (RDP, GENECONV, MaxChi, Bootscan, SisScan and 3SEQ) and considered recombination signals detected by more than two methods for breakpoint identification. Maciej F. Boni, Philippe Lemey, Andrew Rambaut or David L. Robertson. The S1 protein of Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13. Uncertainty measures are shown in Extended Data Fig. Lu, R. et al. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. Discovery and genetic analysis of novel coronaviruses in least horseshoe bats in southwestern China. 82, 48074811 (2008). RegionB is 5,525nt long. Sequences are colour-coded by province according to the map. We find that the sarbecovirusesthe viral subgenus containing SARS-CoV and SARS-CoV-2undergo frequent recombination and exhibit spatially structured genetic diversity on a regional scale in China. Softw. When the genomic data included both coding and non-coding regions we used a single GTR+ substitution model; for concatenated coding genes we partitioned the alignment by codon position and specified an independent GTR+ model for each partition with a separate gamma model to accommodate inter-site rate variation. These residues are also in the Pangolin Guangdong 2019 sequence. BEAST inferences made use of the BEAGLE v.3 library68 for efficient likelihood computations. Due to the absence of temporal signal in the sarbecovirus datasets, we used informative prior distributions on the evolutionary rate to estimate divergence dates. The estimated divergence times for the pangolin virus most closely related to the SARS-CoV-2/RaTG13 lineage range from 1851 (1730-1958) to 1877 (1746-1986), indicating that these pangolin . Preprint at https://doi.org/10.1101/2020.05.28.122366 (2020). The construction of NRR1 is the most conservative as it is least likely to contain any remaining recombination signals. Meet the people who warn the world about new covid variants A dynamic nomenclature proposal for SARS-CoV-2 lineages to - PubMed G066215N, G0D5117N and G0B9317N)) and by the European Unions Horizon 2020 project MOOD (no. Using both prior distributions, this results in six highly similar posterior rate estimates for NRR1, NRR2 and NRA3, centred around 0.00055 substitutions per siteyr1. Even before the COVID-19 pandemic, pangolins have been making headlines. PLoS Pathog. CNN . 5). Note that six of these sequences fall under the terms of use of the GISAID platform. Gray inset shows majority rule consensus trees with mean posterior branch lengths for the two regions, with posterior probabilities on the key nodes showing the relationships among SARS-CoV-2, RaTG13, and Pangolin 2019. Get the most important science stories of the day, free in your inbox. Mol. 04:20. Boni, M. F., Zhou, Y., Taubenberger, J. K. & Holmes, E. C. Homologous recombination is very rare or absent in human influenza A virus. Intragenomic rearrangements involving 5-untranslated region segments in SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses, Crystal structure of the CoV-Y domain of SARS-CoV-2 nonstructural protein 3, Association of underlying comorbidities and progression of COVID-19 infection amongst 2586 patients hospitalised in the National Capital Region of India: a retrospective cohort study, Molecular characterization of horse nettle virus A, a new member of subgroup B of the genus Nepovirus, Molecular phylogeny of coronaviruses and host receptors among domestic and close-contact animals reveals subgenome-level conservation, crossover, and divergence.