Extensive Positive Selection Drives the Evolution of Nonstructural Proteins in Lineage C Betacoronaviruses.
|Product# 17101 Murine Anti- SARS Monoclonal Antibody|
Middle East respiratory syndrome-related coronavirus (MERS-CoV) (http://talk.ictvonline.org/files/proposals/animal_ssrna_viruses/default.aspx) was identified as the causative agent of a new viral respiratory disease in Saudi Arabia in June 2012 (1). Since then, more than 1,500 cases and 571 deaths have been reported worldwide (as of 12 October 2015 [http://www.who.int/csr/don/12-october-2015-mers-saudi-arabia/en/]), although major outbreaks have been confined to the Middle East and, more recently, to South Korea. The rate of human-to-human transmission of MERS-CoV is relatively low, suggesting that a zoonotic reservoir serves as a major source for transmission (2). Recent studies have indicated that MERS-CoV or a closely related virus originated in bats and possibly spread to humans via transmission from dromedary camels (3).
Like SARS-CoV, which causes severe acute respiratory syndrome (SARS) and evolved in bats as well, MERS-CoV (order Nidovirales, family Coronaviridae, subfamily Coronavirinae) is a positive-sense single-stranded RNA (+ssRNA) virus belonging to the C lineage of the Betacoronavirus genus (4). CoVs are exceptional among RNA viruses for having long (∼30-kb) genomes, a feature associated with a specific genome architecture and with the acquisition of an RNA 3′-to-5′ exoribonuclease activity (exoN) (5). About two-thirds of the CoV genome consists of two large overlapping open reading frames (ORF1a and ORF1b) that are translated into the polyproteins pp1a and pp1ab (the latter synthesized via a −1 ribosome frameshift at the 3′ end of ORF1a). These polyproteins are subsequently cleaved by viral proteases to generate 16 nonstructural proteins (nsp1 to nsp16), most of which compose the viral replication-transcription complex (RTC). With the exception of nsp1 and nsp2, whose functions are poorly understood, most nsps have been characterized in some detail for MERS-CoV, SARS-CoV, and mouse hepatitis virus (MHV). Thus, nsp3, a large multidomain and multifunctional protein, was shown to play essential roles in the virus replication cycle. In fact, the papain-like protease (PLpro) activity of nsp3 is responsible for the initial processing of pp1a. Also, nsp3, together with nsp4 and nsp6, recruits intracellular membranes to anchor the RTC and to form a reticulovesicular network of double-membrane vesicles (DVMs) and convoluted membranes where viral RNA replication occurs (6). nsp5 is a second viral protease (3C-like protease [3CLpro]) that cleaves both pp1a and pp1ab to the final nsp products. nsp7 to nsp11 provide the primer-making activity and regulate the function of the main RNA-dependent RNA polymerase (RdRp; nsp12 protein). Finally, nsp13 to nsp16 comprise RNA-modifying enzymes, including the exoN activity in nsp14 (6, 7).
Viral RNA represents the major pathogen-associated molecular pattern (PAMP) recognized by the host immune system during CoV infection (8). Both MERS-CoV and SARS-CoV elicit limited interferon (IFN) responses in most cell types, indicating that these viruses have evolved efficient strategies to evade innate immune sensing and/or to block IFN induction (8). Indeed, these viruses express antagonists of the IFN response, including SARS-CoV ORF6, ORF3b, and nucleoprotein, as well as MERS-CoV structural and accessory proteins M, ORF4a, ORF4b, and ORF5 (9,–12). Additional immune evasion strategies, though, rely on nsps. In fact, the enzymatic activities of nsp14 and nsp16 endow the viral RNA with a 2′-O-methylated cap structure that mimics cellular mRNAs and avoids activation of the innate immunity receptors RIG-I and MDA5 (8). In analogy to the exoN and endoribonucleases expressed by other viruses, such as Lassa fever virus and pestivirus (13, 14), the RNase activities of nsp14 and nsp15 are also thought to play a role in immune escape by digesting RNA PAMPs (8). Moreover, suppression of IFN responses is mediated by the PLpro in nsp3 through its deubiquitinating and deISGylating activities (15, 16), as well as by nsp1. The latter inhibits IFN-dependent signaling by decreasing the phosphorylation levels of STAT1 and suppresses host protein synthesis (17, 18). Finally, PLpro was shown to physically interact with TRAF3, TBK1, IKKε, STING, and IRF3, which represent key cellular components for IFN response (19).
Therefore, the information encoded by CoV ORF1a and ORF1b is essential for viral replication and for immune evasion. For these reasons, inhibitors that interfere with viral enzymatic activities (e.g., proteases and RdRp) are regarded as promising candidates for therapeutic intervention (20).
From an evolutionary standpoint, different observations suggest that nsps may represent targets of natural selection. First, genes encoding molecules that directly interact with the host immune system are thought to be preferential targets of natural selection as a consequence of host-pathogen arms races (21). Second, adaptation to new hosts in other RNA viruses has been associated with selective changes in polymerase genes (22, 23). Finally, the acquisition of a complex replication machinery is evolutionarily linked to genome expansion in Nidovirales (5). Nonetheless, evolutionary studies have mainly focused on the analysis of betacoronavirus (betaCoV) spike proteins, as these are surface exposed and represent major determinants of host range and tissue tropism (24). In this study, we investigated the evolution of ORF1a and ORF1b in MERS-CoV and in lineage C betaCoVs isolated from bats and hedgehogs. Results indicate widespread positive selection, stronger in ORF1a; within this region, nsp3 represents a preferential selection target, and adaptive evolution is ongoing in MERS-CoV strains circulating in the current outbreak.
MATERIALS AND METHODS
Sequences and alignments.
ORF1a and ORF1b sequences for 7 lineage C betaCoVs and 87 MERS-CoV strains (available as of July 2015) were retrieved from the NCBI database; a list of accession numbers for the complete genomes is provided in Table S1 in the supplemental material.
Alignment errors are common when divergent sequences are analyzed and can affect evolutionary inference. Thus, we used PRANK (25) to generate multiple sequence alignments and GUIDANCE (26) for filtering unreliably aligned codons (i.e., we masked codons with a score of <0.90), as suggested previously (27).
The alignments were screened for the presence of recombination events using two methods based on distinct data features. (i) GARD (genetic algorithm recombination detection) (28) uses phylogenetic incongruence among segments in the alignment to detect the best-fit number and location of recombination breakpoints; the statistical significance of putative breakpoints is then evaluated through Kishino-Hasegawa (HK) tests. (ii) GENECONV (29) tests for significant clustering of substitutions along sequences; statistical significance is assessed through permutation with multiple-comparison correction. For both methods, recombination breakpoints were considered significant if the P value was <0.05. No breakpoint was detected in any analysis.
Detection of positive selection.
Gene trees were generated by maximum likelihood using the program phyML with a general time-reversible (GTR) plus gamma-distributed rates model and 4 substitution rate categories (30).
Positive selection can be defined when the nonsynonymous/synonymous rate ratio (ω) is higher than 1; to analyze the presence of episodic positive selection in lineage C betaCoVs, we applied the branch site test (31) from the PAML suite (32). The test is based on the comparison between two nested models: a model (MA) that allows positive selection on one or more lineages (called foreground lineages) and a model (MA1) that does not allow such positive selection. Twice the difference of likelihood for the two models (Δln) is then compared to a χ2 distribution with one degree of freedom (31). A false-discovery rate (FDR) correction was applied to take into account a multiple-hypothesis issue generated by analyzing different branches on the same phylogeny (33). When the likelihood ratio test suggested the action of positive selection, the Bayes empirical Bayes (BEB) analysis was used to evaluate the posterior probability that each codon belongs to the site class of positive selection on the foreground branch.
BUSTED (branch-site unrestricted statistical test for episodic diversification) (34) is a recently developed software designed to describe episodic positive selection that acts on specific branches in the phylogeny at a proportion of sites within the alignment. An alternative model that allows the action on positive selection on foreground branches is compared with a null model that does not allow an ω of >1. Twice the Δln of the two models is then compared to a χ2 distribution (degrees of freedom = 2); if the null model is rejected, at least one site is under positive selection on the foreground branch(es). To detect selection at individual sites, twice the difference of the likelihood for the alternative and the null model at each site is compared to a χ2 distribution (degree of freedom = 1). BUSTED is implemented in the HYPHY package (35).
Conservatively, we considered a site as selected if it showed a P value of ≤0.05 in BUSTED and a posterior probability of ≥0.90 in the BEB analysis.
The site models implemented in PAML were applied for the analysis of nsp3 sequences from MERS-CoV isolates. To detect selection, two different pairs of nested site models (M1a/M2a and M7/M8) were fitted to the data (32); M2a and M8 allow a class of sites to evolve with an ω of >1, whereas M1a and M7 do not. Positively selected sites were identified using the BEB analysis (from model M8) (36). Sites were validated using MEME (37) (with a cutoff of ≤0.1), which allows the distribution of ω to vary from site to site and from branch to branch at a site.
MEME (37) analyses were performed through the DataMonkey server (38).
Detection of coevolving sites.
To detect coevolving sites in the ORF1a and ORF1b alignments, we applied two different methods: Bayesian graphical model (BGM)-Spidermonkey (39) and the Mutual Information Server To Infer Coevolution (MISTIC) (40). Spidermonkey is a tool implemented in the HYPHY package that identifies coevolving sites from an alignment of coding sequences; a BGM is used to evaluate the connection among codons (represented by the nodes of the network). Significant statistical associations between nodes are indicated by the edges of the network, suggesting functional or structural interactions between codons.
MISTIC estimates the relationship between two or more position in an alignment. The coevolutionary association is estimated by Mutual Information (MI), which evaluates how much the information from the amino acid at the first position can help to predict the amino acid identity at the second position.
For BGM-Spidermonkey, sites were filtered based on a minimum count of 4 substitutions across the phylogeny. To be conservative, we considered a pair of residues as coevolving if they showed a posterior probability of >0.75. This threshold corresponds to 0.02% and 1.42% of all analyzed site pairs in ORF1a and ORF1b, respectively. Likewise, for MISTIC, site pairs were required to display an MI rank higher than the 99th percentile calculated using all MI scores from the alignment. Pairs of sites exceeding the thresholds for both methods were declared to be coevolving.
Membrane topology, glycosylation site predictions, and three-dimensional (3D) structure mapping.
The membrane protein topology for MERS-CoV nsp3 and nsp4 was predicted by using TMHMM (http://www.cbs.dtu.dk/services/TMHMM/) (41). N-Glycosylation sites were predicted with NetNGlyc (http://www.cbs.dtu.dk/services/NetNGlyc/), a program that uses artificial neural networks to examine the sequence context of Asn-X-Ser/Thr motifs.
Sites were mapped onto structures using PyMOL (PyMOL molecular graphics system, version 22.214.171.124; Schrödinger, LLC).
nsp3 in ORF1a is a major selection target in betaCoVs.
Lineage C of betaCoVs includes two bat species closely related to MERS-CoV, namely, Ty-BatCoV HKU4 and Pi-BatCoV HKU5, isolated from the lesser bamboo bats (Tylonycteris pachypus) and Japanese pipistrelles (Pipistrellus abramus), respectively (4). Additional viruses belonging to lineage C of betaCoVs have been described to occur in bats (BtCoV/133 and BtVs-BetaCoV/SC2013) and hedgehogs (hedgehog coronavirus [EriCoV]) (20, 42,–44). Recently, a virus belonging to the same species as MERS-CoV was isolated in Neoromicia bats (NeoCoV) (45). To investigate the evolutionary history of ORF1a, we obtained sequence information for these viruses and for 6 MERS-CoV strains isolated from either humans or camels and belonging to the major groups described to date (46) (Fig. 1). The sequence alignment was pruned of unreliably aligned codons (see Materials and Methods), a procedure that resulted in the masking of almost the entire acidic domain in nsp3. Indeed, this region was previously shown to be highly divergent among CoVs (47). We next analyzed the alignment for the presence of recombination breakpoints using GARD (genetic algorithm recombination detection) (28) and GENECONV (29). No evidence of recombination was detected.
Maximum likelihood phylogeny for nsp3 sequences in a subset of isolates representing MERS-CoV major groups. The amino acid alignment of the region surrounding the two positively selected sites (magenta) in MERS-CoV isolates is also shown. Asterisks indicate viruses isolated from dromedary camels.
Positively selected sites in the context of betaCoV biology.
To shed light on the functional role of selected sites, we exploited available structural, genetic, and biochemical data for nsps (mainly obtained for SARS-CoV and MHV), as well as in silico prediction of transmembrane helices and glycosylation sites.
Most positively selected sites were located in nsp3 (Fig. 3A). In the PLpro domain, four of the positively selected sites, including two that are selected in the viral stains from the current MERS-CoV epidemic, were clustered in a spatially confined region opposed to the catalytic crevice. This region corresponds to the “palm” of the right-handed architecture of the protease, whereas three additional sites are located on the “fingers” (Fig. 3A). One of these three surface-exposed sites, Q830, is part of a noncanonical nsp5 cleavage site, unique for MERS-CoV, that could contribute to the formation of new cleavage products (48) (Fig. 3A).