Quantcast
cdc.gov/

A gene within a gene discovered in SARS-CoV-2

Researchers studying SARS-CoV-2, the virus responsible for the COVID-19 pandemic, have found it contains a previously unknown overlapping gene. Understanding such genes will improve our knowledge and could even alter how we fight it.


Mary Lou Lang
Dec 3, 2020

Researchers studying SARS-CoV-2, the virus responsible for the COVID-19 pandemic, have found it contains a previously unknown overlapping gene. Understanding such genes will improve our knowledge and could even alter how we fight it.

The findings of the study were published in eLife. The lead author, Chase Nelson, a postdoctoral research fellow at Academia Sinica in Taipei and a visiting scientist at the American Museum of Natural History, explained the key findings to Current Science Daily.  

Nelson said he and his fellow researchers and authors set out to answer the question, "Does SARS-CoV-2, the virus responsible for the COVID-19 pandemic, contain overlooked, overlapping genes with functional importance?''

"We hypothesized that such genes might exist, and that understanding them could help answer questions about the emergence of the virus," Nelson said.

He noted that prior to the pandemic he and collaborators Zachary Ardern and Xinzhu Wei had devised a method to study overlapping genes by screening genomes for patterns of genetic change that are unique to overlapping genes. This method was applied to SARS-CoV-2.

Nelson said there were five key findings and listed them for Current Science Daily:

"First, there is an evolutionarily novel overlapping gene named ORF3d in S-ARS-CoV-2 [2019-present], which was not present in SARS-CoV (2002-2003). ORF3d occurs entirely within a somewhat larger gene named ORF3a, which seems to be a hot spot for evolutionary change.

"Second, because virus genomes contain very few genes — in this case ~15 — one new gene could make a big difference. Thus, ORF3d should be a top candidate for investigating unique features of the pandemic virus. Note that the specific function of ORF3d is not known.

"Third, we provide evidence from ribosome profiling data and patterns of evolutionary conservation that the ORF3d protein is manufactured during human infection and that it may have a biological function. We also review evidence from other groups, including the fact that ORF3d induces one of the strongest antibody responses observed in COVID-19 patient sera [Hachim et al. 2020].

"Fourth, the gene in which ORF3d occurs, ORF3a, contains a total of three overlapping genes in SARS-CoV-2 [ORF3c, ORF3d, and a short version of ORF3b]) and two overlapping genes in SARS-CoV [ORF3c and ORF3b]. However, only ORF3b was known before the COVID-19 pandemic. For this reason, numerous studies have used the name ORF3b to refer to ORF3d, apparently assuming they must be the same gene because they both occur within ORF3a. However, they are entirely different genes that occupy entirely different positions within ORF3a. Unfortunately, the result of the ambiguous naming has been that many researchers have assumed that ORF3d [unknown function] has the same function as ORF3b [interferon antagonist], which is not justified and could potentially mislead research into diagnostics and therapeutics. Thus, going forward, it will be critical for researchers to specify precisely which overlapping gene they are referring to in their studies.

"Fifth, while SARS-CoV-2 has gained ORF3d, it has also lost the full-length version of ORF3b from SARS-CoV, which also occurs within ORF3a. Thus, the loss of full-length ORF3b should also be a top candidate for investigating unique features of the pandemic virus."

When asked if he could explain what is an overlapping gene (OLG), Nelson said the term means different things to different researchers. 

"We reserve the term OLG to mean a protein-coding gene that partially or fully overlaps another protein-coding gene and uses a different reading frame," he said. "With respect to the extent of overlap, ORF3d is an example of a fully overlapping gene, as it occurs entirely inside of ORF3a. With respect to reading frame, protein-coding genes read DNA or RNA nucleotide 'letters' in groups of three, calledcodons. For example, the RNA string GGAUGG contains two codons, GGA and UGG, encoding the amino acids glycine and tryptophan, respectively. 

"Imagine that this is the reading frame of ORF3a. In SARS-CoV-2, ORF3d uses the same letters as ORF3a, but its reading frame is shifted in that its codons always begin at the third letter of ORF3a's codons. In the present example, ORF3d would combine the last letter of GGA [A] with the first two letters of UGG [UG] to form the codon AUG, which encodes methionine. In this way the same DNA or RNA letters can be used to make entirely different protein products. Note that because our definition requires distinct reading frames, mere subsets of genes [e.g. smaller portions of ORF3a in the same reading frame] do not qualify."

It's uncertain if the overlapping gene in SARS-CoV-2 plays a role in the virulence of the virus.

"The specific function of ORF3d is not known," Nelson said. "While it is possible that ORF3d may confer a benefit to the virus, contributing to its virulence, we also know that it is not necessary for the virus to replicate." 

The evidence of that, Nelson explained, is "our study documents a mutation occurring at a site overlapped by three genes — ORF3a, ORF3c and ORF3d — which simultaneously deactivates full-length ORF3d and causes amino acid changes in both ORF3a and ORF3c. This example highlights that mutations in overlapping genes can affect more than one protein product and therefore alter more than one function, making it difficult to study their independent effects. To better answer this question, future studies should compare the characteristics of SARS-CoV-2 isolates which have active vs. inactive ORF3d to determine what [if any] benefit the gene might confer to the virus."

The OLG found in SARS-CoV-2 was also found in some pangolin coronaviruses.

"For our study the presence of ORF3d in pangolin coronaviruses from Guangxi allowed us to compare the sequence of ORF3d in SARS-CoV-2 to that of a close relative," he said. "This analysis revealed slightly fewer amino acid changes than expected by chance, suggesting that ORF3d may have a conserved function. However, this does not imply that pangolin was an intermediate host: pangolin coronaviruses from Guangdong are even more closely related to SARS-CoV-2 than those from Guangxi, but lack ORF3d."

In addition, Nelson said that "new unpublished bat coronaviruses from Rwanda and Uganda have an even longer version of ORF3d. Finally, the ORF3a region is prone to lots of recombination, i.e. swapping of genetic material between viruses. Taken together, these considerations do not provide any new compelling evidence regarding a possible pangolin origin of SARS-CoV-2, as its peculiar pattern of occurrence could be just as likely due to repeated gain, repeated loss, or recombination." 

Other researchers could have missed the gene as OLGs are difficult to spot and computation methods are not designed to find them, Nelson said. 

"Specifically, when new genomes are annotated, computer programs analyze the sequence and typically assign one longest gene candidate per genome region," he said. "However, it is becoming increasingly clear that this is causing researchers to miss a world of short and overlapping functional genomic elements. OLGs are especially common in viruses, which natural selection has honed to maximize replication efficiency and thus information content per nucleotide. Missing overlapping genes therefore puts us in peril of overlooking important aspects of viral biology."

Regarding his thoughts on the taxonomic distribution and origins of ORF3d, Nelson indicated that in addition to his previous comments on pangolins, "its distribution is puzzling and must involve some combination of repeated gain, repeated loss or recombination. Given the virus genomes currently available to us, ORF3d does not provide any compelling evidence on where SARS-CoV-2 came from." 

The published manuscript stated, "To estimate natural selection on ORF3d, we measured viral diversity at three hierarchical evolutionary levels: between-taxa, between-host, and within-host."

Nelson gave a further explanation of what they meant.

"For viruses, no less than any other biological system, environment is key. It is entirely possible for the same gene to be critical for transmission between hosts [i.e. from one infected individual to another] but not for replication within a host [i.e. within one infected individual]," Nelson said. "Thus, at their best, studies comparing genome sequences to detect natural selection should consider all evolutionary levels [such as] changes the virus undergoes between host species, changes between individual hosts of the same species, and changes within individual hosts. 

"Thus, for example, while we find that ORF3d is not necessary for viral replication within some hosts, it could still provide some advantage within other hosts or during transmission. We just don't know. Unfortunately, working out the function of a gene is very laborious. We hope that molecular biologists will take up this task for ORF3d."

Funding for the research was provided partly by Academia Sinica, the Bavarian State Government, the U.S. National Science Foundation, the National Philanthropic Trust and the University of Wisconsin-Madison, according to Science Daily.


RECOMMENDED