Redundancy of the genetic code

Redundancy of the genetic code

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

One particular codon codes only for one amino acid, but an amino acid can be coded for by several different codons. Now according to the genetic code, the codonUUUcodes for the amino acid phenylalanine andUUAcodes for leucine. But, according to the Wobble Hypothesis, the base on the third position of the codon and that on the anticodon need not be complementary (which helps explain why there are very few types of tRNA molecules, inspite of there being 61 codons). If this hypothesis is true, then we could have a phenylalanine placed in a position which was meant to be for leucine, and vice versa (since the codons coding for them differ only in their third base). The same holds true for pairs like aspartic acid & glutamic acid and serine & arginine. So how does translation of a particular mRNA molecule result in the right polypeptide sequence?

Wobble pairing is just a phenomenon and not a hard and fast rule. There are some justifications for why it should exist and that is why it is still called a hypothesis. And this statement is not true:"the base on the third position of the codon and that on the anticodon need not be complementary". The anticodon residue corresponding to the third residue of codon can be a promiscuous base which can pair with two or many different bases. The tRNA for Phenylalanine has an anticodon -GAAwhich can pair with bothUUUandUUCbut notUUA.

So the statement of wobble hypothesis is that the first base of the anticodon (often is a modified/atypical nucleobase) can show promiscuity of binding.

You are correct in saying that Crick, in his Wobble Hypothesis, proposed that “the base on the third position of the codon and that on the anticodon need not be complementary”, but the “need not be” in your statement is a paraphrase of the “some” in Crick's original statement:

“It is suggested that while the standard base pairs may be used rather strictly in the first two positions of the triplet, there may be some wobble in the pairing of the third base.”

If you read that paper - or consult the Wikipedia entry under Wobble - you will become aware that Crick is using the word “some” to indicate that:

(i) The wobble proposed is specific for certain base pairs.

(ii) That such wobble base pairs will only be found in cases where they do not violate the genetic code.

The Wobble Hypothesis - as stated above - has been unequivocally shown to be correct. The specific Wobble Rules that Crick proposed to satisfy point (i) were based on an examination of the chemistry of the bases, and have been shown to be partially correct:

Wobble Rules: Crick's origin predictions compared with observed 5'-anticodon bases and their base-pairing with codons.

Thus, the prediction that the 5'-tRNA anticodon bases, G and I could wobble (and C could not) have been borne out. Crick was aware of the paucity of A at this position in anticodons, and both it and U are normally found in chemically modified forms, the base-pairing of which he did not attempt to predict (he was unaware of most of them) and which is different in different cases. The point to remember is that there is a scientific rationale for this in terms of the three-dimensional structure of the anticodon in the tRNA (which holds the first two bases in position by base stacking) and the proximity of the potentially hydrogen-bonding groups in the various bases.

Point (ii) is that Nature will only use Wobble where the genetic code allows. G pairing with C or U always works, whereas I pairing with A, C or G will work with amino acids encoded by all four bases in a block (e.g. Leu, Val, Ser), but not where there are blocks of two (e.g. Tyr, His, Asn).

One final point that emphasizes the chemical basis of all this. Mammalian mitochondria have a different set of wobble rules on account of their peculiarly truncated tRNAs.

Redundancy of the genetic code - Biology

By the end of this section, you will be able to:

  • Explain the “central dogma” of protein synthesis
  • Describe the genetic code and how the nucleotide sequence prescribes the amino acid and the protein sequence

The cellular process of transcription generates messenger RNA (mRNA), a mobile molecular copy of one or more genes with an alphabet of A, C, G, and uracil (U). Translation of the mRNA template converts nucleotide-based genetic information into a protein product. Protein sequences consist of 20 commonly occurring amino acids therefore, it can be said that the protein alphabet consists of 20 letters (Figure 1). Each amino acid is defined by a three-nucleotide sequence called the triplet codon. Different amino acids have different chemistries (such as acidic versus basic, or polar and nonpolar) and different structural constraints. Variation in amino acid sequence gives rise to enormous variation in protein structure and function.

Figure 1. Structures of the 20 amino acids found in proteins are shown. Each amino acid is composed of an amino group (NH + 3), a carboxyl group (COO − ), and a side chain (blue). The side chain may be nonpolar, polar, or charged, as well as large or small. It is the variety of amino acid side chains that gives rise to the incredible variation of protein structure and function.

10.4: The Genetic Code

  • Contributed by E. V. Wong
  • Axolotl Academica Publishing (Biology) at Axolotl Academica Publishing

We have blithely described the purpose of the DNA chromosomes as carrying the information for building the proteins of the cell, and the RNA as the intermediary for doing so. Exactly how is it, though, that a molecule made up of just four different nucleotides joined together (albeit thousands and even thousands of thousands of them), can tell the cell which of twenty-odd amino acids to string together to form a functional protein? The obvious solution was that since there are not enough individual unique nucleotides to code for each amino acid, there must be combinations of nucleotides that designate particular amino acids. A doublet code, would allow for only 16 different combinations (4 possible nucleotides in the first position x 4 possible nucleotides in the second position = 16 combinations) and would not be enough to encode the 20 amino acids. However, a triplet code would yield 64 combinations, easily enough to encode 20 amino acids. So would a quadruplet or quintuplet code, for that matter, but those would be wasteful of resources, and thus less likely. Further investigation proved the existence of a triplet code as described in the table below.

With so many combinations and only 20 amino acids, what does the cell do with the other possibilities? The genetic code is a degenerate code, which means that there is redundancy so that most amino acids are encoded by more than one triplet combination (codon). Although it is a redundant code, it is not an ambiguous code: under normal circumstances, a given codon encodes one and only one amino acid. In addition to the 20 amino acids, there are also three &ldquostop codons&rdquo dedicated to ending translation. The three stop codons also have colloquial names: UAA (ochre), UAG (amber), UGA (opal), with UAA being the most common in prokaryotic genes.

The colloquial names were started when the discoverers of UAG decided to name the codon after a friend whose last name translated into &ldquoamber&rdquo. Opal and ochre were named to continue the idea of giving stop codons color names.

The stop codons are sometimes also used to encode what are now considered the 21 st and 22 nd amino acids, selenocysteine (UGA) and pyrrolysine (UAG). These amino acids have been discovered to be consistently encoded in some species of prokarya and archaea.

Note that there are no dedicated start codons: instead, AUG codes for both methionine and the start of translation, depending on the circumstance, as explained forthwith. The initial Met is a methionine, but in prokaryotes, it is a specially modified formyl-methionine (f-Met). The tRNA is also specialized and is different from the tRNA that carries methionine to the ribosome for addition to a growing polypeptide. Therefore, in referring to a loaded initiator tRNA, the usual nomenclature is fMet-tRNAi or fMet-tRNAf. There also seems to be a little more leeway in defining the start site in prokaryotes than in eukaryotes, as some bacteria use GUG or UUG. Though these codons normally encode valine and leucine, respectively, when they are used as start codons, the initiator tRNA brings in f-Met.

Although the genetic code as described is nearly universal, there are some situations in which it has been modified, and the modifications retained in evolutionarily stable environments. The mitochondria in a broad range of organisms demonstrate stable changes to the genetic code including converting the AGA from encoding arginine into a stop codon and changing AAA from encoding lysine to encoding asparagine. Rarely, a change is found in translation of an organismic (nuclear) genome, but most of those rare alterations are conversions to or from stop codons.

Other minor alterations to the genetic code exist as well, but the universality of the code in general remains. Some mitochondrial DNAs can use different start codons: human mitochondrial ribosomes can use AUA and AUU. In some yeast species, the CGA and CGC codons for arginine are unused. Many of these changes have been cataloged by the National Center for Biotechnology Information (NCBI) based on work by Jukes and Osawa at the University of California at Berkeley (USA) and the University of Nagoya (Japan), respectively.

Neural Induction Embryonic Stem Cells

9.3.1 Neural Induction in Mammalian ESCs

In the absence of conclusive evidence from genetic analysis of the mouse, mostly due to genetic redundancy discussed earlier, ESCs represent perhaps the most amenable model for addressing the sufficiency of BMP inhibition for neural induction in mammals. It is safe to assume that, as with animal caps, differentiation of both mouse and human ESCs follows the hierarchical set of signals that regulate embryonic development in the generation of the germ layers and specific cell types ( Thomson and Marshall, 1998 Yu and Thomson, 2008 ).

Providing a rigorous answer to this question – in mammalian ESCs in general and in hESCs in particular – has been complicated by technical limitations inherent to the in vitro aspects of cell culture. For example, mESCs and hESCs are routinely grown in the presence of 20% serum to provide the nutrients required for growth thus, they are already exposed to a continuous supply of extrinsic factors. Additionally, pluripotent mESC cultures require leukemia inhibitory factor (LIF) and BMPs, along with pluripotent hESCs, Wnt, activin/nodal as well as very high levels of FGF ( Singh and Brivanlou, 2010 ). These extrinsic signaling requirements greatly complicate the analysis and create confusion in the distinction between inducers and modifiers of neural fate specification. Moreover, cell type-specific molecular markers of human embryonic fate, used routinely as diagnostic cell fate are extrapolated from mouse embryos. Until the specificity of these markers in human embryos is validated directly, it is important to remember that this assumption might be misleading. Finally, the lack of in vivo assays to validate in vitro observation represents a major hurdle in the case of hESCs. Differentiations of ESCs as monolayer or embryoid bodies or teratomas are no substitute for in vivo embryonic assays. In vitro culture or tumors lack the complex germ-layer interactions and morphogenetic movements that occur in vivo, thus failing to facilitate inductive interactions. Attempts to design in vivo assay for hESCs by formation of very early mouse/human chimerism do not represent a perfect substitute, as they suffer from technical limitations and poor survivability ( James et al., 2006 ). Despite these limitations, ESCs have been proven to be powerful tools in our current understanding of differentiation pathways.

Studies addressing the molecular basis of neural specification in mammalian ESCs revolve around three experimental paradigms. The first is coculture of ESCs with different feeder lines that induce neural fate. The second is based on eliminating or minimizing cellular contact by growing ESCs in minimum medium by the use of high-dilution plating (including single cells), thereby mimicking the amphibian animal cap-cell dissociation experiments. Finally, cocktails or individual factors are presented to cells grown in a variety of conditions to test their neural inducing activities. Combinations of these approaches have been used successfully to demonstrate that BMP inhibition is sufficient in both mouse and human ESCs to induce neural fate, providing support for the amphibian data on BMP inhibition and the neural default model. While ultimately coming to conclusions similar to these, we will first discuss the mouse and then describe the human experiments before converging on the default model.

Evolution of genetic redundancy

Genetic redundancy means that two or more genes are performing the same function and that inactivation of one of these genes has little or no effect on the biological phenotype. Redundancy seems to be widespread in genomes of higher organisms. Examples of apparently redundant genes come from numerous studies of developmental biology, immunology, neurobiology and the cell cycle. Yet there is a problem: genes encoding functional proteins must be under selection pressure. If a gene was truly redundant then it would not be protected against the accumulation of deleterious mutations. A widespread view is therefore that such redundancy cannot be evolutionarily stable. Here we develop a simple genetic model to analyse selection pressures acting on redundant genes. We present four cases that can explain why genetic redundancy is common. In three cases, redundancy is even evolutionarily stable. Our theory provides a framework for exploring the evolution of genetic organization.


Model 1. Consider a haploid population with genes at two loci, A and B. Non-functional alleles, a and b, arise at mutation rates ua and ub. There are four genotypes, AB, Ab, aB and ab. The frequencies are x1, x2, x3 and x4, and the fitnesses are f1, f2, f3 and f4, respectively. In each generation there is mating (with recombination), followed by mutation and selection. Mating is described by the difference equations: x1 = x1 + D, x2 = x2D, x3 = x3D, and x4 = x4 + D. Here, D = r(x2x3x1x4), where r is the recombination rate between the A and B loci, and r is a number between 0 and 0.5. Mutation is described by x1 = x1(1 − ua)(1 − ub), x2 = x1(1 − ua)ub + x2(1 − ua), x3 = x1ua(1 − ub) + x3(1 − ub), and x4 = x1uaub + x2ua + x3ub + x4. Selection is described by xi = fixi/f, where f = Σixifi denotes the average fitness of the population. Suppose both genes perform function F with equal efficacy. We have f1 = f2 = f3 = 1 and f4 = 0. For exactly equal mutation rates, ua = ub = u, there is a line of equilibria given by x1 = x2x3r(1 − u)/u. For unequal mutation rates, the gene with higher mutation rate will become extinct.

Model 2. This has the same framework as model 1, but genes A and B performfunction F with different efficacies, ha and hb. Let ha > hb. The genotype fitnesses are f1 = f2 = ha, f3 = hb and f4 = 0. Redundancy can be evolutionarily stable if B has a lower mutation rate than A, ub < ua. If 1− (hb/ha) > ua > ub[1 + (1/r)(hahb)/hb] the equilibrium is x1 * = (1 − x2 * ) × [ha(1 − ua) − hb(1 − ub)] / [(hahb)(1 − ua)], x2 * = (1/r)[ub/(1 − ub) × [ha(1 − ua) − hb(1 − ub)] / [hb(uaub)], x3 * = 1 − x1 * − x2 * , and x4 * = 0. For low mutation rates, the equilibrium frequency of the redundant AB genotype is approximately x1 * ≈ 1 − (1/r)[ub/(uaub)](hahb)/hb. For example, if ha = 1, hb = 0.99, ua = 1.1 × 10 −6 , ub = 10 −6 and r = 0.5, then the equilibrium frequency of AB is about 0.8.

This model can be expanded to n genes with different mutation rates and different efficacies. The fitness of a particular genotype is given by the efficacy of the most efficient gene. If less efficient genes have lower mutation rates then stability of several redundant genes is possible. For a large number of genes, however, the conditions on efficacies and mutation rates become very restrictive.

Model 3. Consider two genes, A and B, and two functions, F1 and F2. Gene A performs function F1 with efficacy ha, and gene B performs function F1 with a lower efficacy hb and function F2 with an efficacy of one. Mutations in A lead to the inactive variant a the mutation rate is ua. Mutations in B can either lead to variant b1, which has lost the ability to perform function F1 but still performs F2, or to variant b2, which is completely inactive mutation rates are ub1 and ub2, respectively. Variant b2 can also arise from b1 at a mutation rate ub3. The redundant organization for performing function F1, is evolutionarily stable if ub1 < ua. The analysis is similar to model 2 if ub2ub3: for low mutation rates, the equilibrium frequency of AB is approximately x1 * ≈ 1 − (1/r)[ub1/(uaub1)] × (hahb)/ha. For the same numerical values as model 2, and assuming that ub1 is 10 times smaller than ua, we find that the equilibrium frequency of AB is 0.998. Pleiotropy facilitates redundancy.

Model 4. Consider two genes A and B with mutation rates ua and ub and developmental error rates δa and δb. Mutation and selection are described by thedifference equations x1 = (1 − δaδb)(1 − ua)(1 − ub)x1/f, x2 = (1 − δa) × (x1ub + x2)/f, x3 = (1 − δb)(1 − ub)(x1ua + x3)/f, x4 = 0, where f is such that x1 + x2 + x3 = 1. In contrast to models 1–3, recombination is not essential here. The equilibrium frequency of AB is x1 = 1/<1 + [ua(1 − δb)]/ [δb(1 − δa) − ua(1 − δaδb)] + [ub(1 − δa)] / [δa(1 − δb) − ub(1 − δaδb)]>. For small values of u and δ, we obtain x1 ≈ 1/<1 + [ua/(δbua)] + [ub/(δaub)]>. Thus necessary conditions for a large x1 are ua < δb and ub < δa.

The model can be extended to n genes. Suppose all genes have mutation rateu and developmental error rate δ. Let xi denote genotypes with i genes (i = 0,…, n). The population dynamics are xnk = (fnk/f) Σi = 0 k (ki ni ) × u ki xni, where fj = (1 − δ j )(1 − u) j and f is such that all frequencies add to one. The equilibrium can be solved recursively. An equilibrium with the genotype containing all n redundant genes is possible if fn > fn−1. This leads to n < 1 + (log u)/(log δ).

Diploid models. Our results for haploid models also apply to diploid models. In diploid models, we distinguish four gametes, AB, Ab, aB and ab, which form nine zygotes: AB/AB, AB/Ab, Ab/Ab, AB/aB, aB/aB, AB/ab, Ab/ab, aB/ab and ab/ab. For each generation we assume that mutation acts on gamete frequency, then zygotes are formed, selection acts on zygotes, and finally new gametes are formed, including the possibility of recombination. In agreement with haploid model 1, we find that the case where all zygotes have high fitness except ab/ab which has low fitness, does not lead to stable redundancy. Cases similar to models 2 and 3 give stable redundancy. Diploid models with developmental errors also give stable redundancy.

There are some additional cases that can lead to redundancy in diploid models. One such case was discovered by Brookfield: it assumes that the double heterozygote, AB/ab, is as fit as the wild type, AB/AB, but Ab/ab, aB/ab and ab/ab have low fitness 1 . In addition, stable redundancy is also possible for partial dominance where all homozygotes have high fitness, the double heterozygote has a lower fitness, the single heterozygotes have still lower fitness, and ab/ab has lowest fitness.

Classification of redundancy. It is helpful to distinguish three types of genetic redundancy. (1) True redundancy 1 denotes the situation where an individual with a redundant genotype, AB, is not fitter than one in which one of the redundant genes has been knocked out, Ab. In model 2, B is truly redundant, but A is not. In cases with pleiotropy, ‘true redundancy’ implies that the fully redundant genotype is not fitter than a genotype where the pleiotropic function of one gene has been eliminated. (2) ‘Generic redundancy’ is the case when an AB individual is only occasionally fitter than an Ab individual. This can be the consequence of rare developmental errors. Another possibility is that AB is only fitter than Ab in some environments. (3) ‘Almost redundancy’ means than the redundant genotype AB is always slightly fitter than any genotype where one of the redundant genes has been knocked out. Of course, the fitness difference should be small if the situation is to be regarded as one of redundancy. Several such examples have been discussed previously 5 .

Dichotomy of Ribosomal Translational Folding

Mechanistic View

Several studies suggest that ribosomes use multiple pathways to promote structural formations in nascent chains. Ribosomes can promote helix formations (Woolhead et al., 2004), compaction of arrested nascent chains (Lu and Deutsch, 2005) and possible co-translation formation of secondary and some tertiary structures (Evans et al., 2008 Kosolapov and Deutsch, 2009). In prokaryotes co-translational folding involves trigger factors and chaperones. In eukaryotes it involves primarily chaperones and binding proteins. For example the ribosome tunnel acts as a tube that can handle extended conformations and secondary structures of the peptide chain. The tunnel rim consists of RNA and ribosomal proteins. These proteins are interaction sites for ribosome-associated factors used for targeting, and folding of the peptide chain. Charge specific residues of nascent peptides slow down or stop the translation process (Kramer et al., 2009). Other pathways have been observed to regulate protein synthesis and assist with the association of factors such as the Signal Recognition Particle (SRP) (Kramer et al., 2009). Ribosomes transmit signals relating the nascent chain and its position in the tunnel to their surface, thereby controlling the interactions with SRP (Walter and Blobel, 1983 Kramer et al., 2009). The binding of SRP to the nascent protein in eukaryotes can stop the translation process (Kramer et al., 2009). Ribosomal architecture uses feedback through tunnel interactions and protein signaling to control translational folding (Marin, 2008).

Chaperones are also involved in de-novo protein folding. Chaperones work cooperatively with ribosomes in proximity to them. These co-translational activities exhibit temporal orchestration and typically act downstream in the folding process. The large number of chaperone mechanisms and their temporal interactions with nascent polypeptide chains act to coordinate co-translational folding during its growth stages. Bacterial trigger factors are ribosome associated chaperones. They work in conjunction with the nascent chains and their proximity to ribosome exit tunnels. Trigger factor interaction with the ribosome and nascent chain is a function of their length, sequence and folding status (Kaiser et al., 2006 Raine et al., 2006). By reducing the rate of folding in vitro and in vivo, trigger factors have been shown to improve the folding of model multi-domain substrates (Agashe et al., 2004). Prokaryotes use ribosome bound chaperone trigger factors. Eukaryotes use factors such as J, Hsp70, Hsp 40 and nascent chain associated complex (NAC) protein-based systems along with other such mechanisms.

Co-TP can also be induced in response to environmental stress (Liu et al., 2013). Pausing allows cells to adapt to changing environmental conditions such as heat stress. This pause has been observed where the nascent polypeptide emerges from the ribosomal exit tunnel. This has the effect of inhibiting chaperone operation by a dominant-negative mutant or other chemical inhibitors. This suggests a dual role for chaperones for both elongation and co-TP (Liu et al., 2013). Studies have shown that ribosome's can fine-tune the elongation process by sensing and reacting to the intercellular environment.

Internal Control (Nucleotide Arrangement)

TP of nascent proteins has also been linked to the arrangement of nucleotides in the mRNA as well as sections of nucleotide coding regions that destabilize or terminate protein synthesis. Pausing can be induced by mRNA structure (Somogyi et al., 1993), SRP binding (Lipp et al., 1987), mRNA binding proteins, rare codons (Varenne et al., 1984), and anti-Shine-Dalgarno (aSD) codon sequences (Li et al., 2012). It has been shown that replacing rare codons with more abundant codons in Escherichia coli or Saccharomyces cerevisiae has resulted in faster protein translation rates. But, these factors also adversely reduce the activity of those proteins (Crombie et al., 1992 Komar et al., 1999). This silent mutagenesis resulted in 20% lower specific activity leading to increased levels of mis-folding. Further, it has been shown that the folding efficiency of a multi-domain protein in E. coli has been perturbed by synonymous substitutions of rare codons by abundant tRNAs (Zhang et al., 2009). However data shows that with fixed levels of tRNA's, synonymously encoded mRNA's translate with different speeds (Sorensen et al., 1989 Sorensen and Pedersen, 1991 Li et al., 2012). For example, a silent mutation in the human gene ABCB1 caused a conformational change to occur in the P-glycoprotein. This protein folded differently caused by a temporal change in translation affecting the timing of the folding process (Kimchi-Sarfaty et al., 2007). Thus, the protein folding pathways are affected by changes in the coding regions of DNA.

An example of particle binding to the mRNA can be found in the 249 nucleotide region of c-myc mRNA known as coding region instability determinant (CRD) (Lemm and Ross, 2002). It has been hypothesized that TP occurring in the CRD region causing downstream regions of the c-myc mRNA to be susceptible to endonuclease cleavage (Lemm and Ross, 2002). This attack can occur during the pausing time unless certain binding proteins (CRD-BP) are attached to this region that shields it from the endonuclease process. The pause sites occur within the CRD c-myc region and map to rare arginine (CGA) and adjacent threonine (ACA) codons (Lemm and Ross, 2002). Data from Lemm (Lemm and Ross, 2002) shows that pause sites also occur at different codons within the CRD. The first arginine codon, however, is the strongest site. Changing both the arginine CGA and threonine ACA codon to more common synonymous codons did not cause the ribosome to pause. This supports the claim that the CGA and ACA codons are a pause site (Lemm and Ross, 2002), since CGA and ACA produce a pause, while replacing them with synonymous codons produced no pausing effect.

Recent work has built on the above observations showing a strong relationship between specific arrangements of codons in mRNA to the rate of translation (Li et al., 2012). Codon pairs within the coding regions that are similar to Shine Dalgarno (SD) sequences have shown a direct correlation to TP. In bacteria, initiation of the translation process is preceded by the acquisition of a six-nucleotide element sequence known as the SD sequence. Normally this SD sequence precedes the coding region of the mRNA transcript and allows ribosome binding at the start codon (Chen et al., 1994). The SD sequence is generally upstream of the start codon AUG (Shine and Dalgarno, 1975). The translation rate is a function of the hybridization free energy of a hexanucleotide to an aSD sequence in the 16S rRNA of the ribosome. These non-uniform rates are dependent upon embedded code (in the form of similar aSD sequences) within the body of the coding regions of the genetic message contained in the mRNA transcript (Li et al., 2012). Transient pauses have been shown to affect co-translational folding of the nascent protein by modulating the elongation process (Li et al., 2012). This temporal control plays a major role in prescribing protein functionality.

TP has been studied in very few non-bacterial species thus far. Shalgi et al. (2013) reported evidence of TP resulting in elongation pausing due to heat shock events in both mouse and human cells. Misfolding of proteins in both the cytoplasm, and during translation, triggers the cell to respond through the use of an up-regulated expression of heat shock proteins. During a heat stress event, TP is initiated in the ribosome around codon 65 in most mouse and human cells. This genome-wide phenomenon has been suggested to involve ribosome associated chaperones. Regulatory mechanisms may be involved in TP around codon 65 of most of the gene's mRNAs, resulting in elongational pausing in which a particular class of chaperones are employed to respond to heat induced misfolding (Richter et al., 2010). It still remains to be seen if codons at codon position 65 exhibit temporal tuning as a function of codon redundancy.

Common Thread Between Mechanistic and Internal Control

A common thread exists between the mechanical execution of the folding process (exit tunnel/factors/chaperones) to internal mRNA processes involved in folding of the nascent protein. We argue that the causal relationship to co-translational folding is due to a prescribed arrangement of codons within the mRNA. We base this on the fact that for trigger factors, chaperones, and binding proteins are all related to the nascent amino acid chain sequence. Amino acid sequence, by necessary consequence, points to mRNA sequences. We further posit that the interactions with translation pausing can be traced back to the specific arrangements of redundant codons in the mRNA, and ultimately to the genome. We propose that the pausing functions are facilitated by first generating a pause state in the translation of the mRNA codons within the ribosome. This gives protein factors, trigger factors and other chaperones the necessary time to mechanically perform folding operations.

“Pausing function” is caused by specific mRNA codon sequences rather than by tunnel-protein interactions to amino acid sequences. This contention is supported by data involving the substitution of rare codons with synonymous codons in E. coli. If the pausing effect was solely related to the amino acid chain sequence, then replacing codons with synonymous codons should still produce the same folded amino acid chain with the same translation speed. However, substitution of rare codons with synonymous codons did produce a change in speed and conformation changes (Gong and Yanofsky, 2002 Lemm and Ross, 2002 Chiba et al., 2011 Li et al., 2012).

Global analysis in bacteria indicates that 70% of strong pausing occurs when internal SD-like sequences are dominant in the coding regions (Li et al., 2012). It should be noted that canonical SD sites within the body of the coding regions are rare as opposed to low-affinity hexamers having variable rates of occurrence. This logic lines up with the hypothesis that TP causality is due to the codon sequence, which ultimately can be traced back to the genome. This cause and effect relationship provides a coherent explanation for the causality of the TP phenomenon.

Using this reasoning we have examined SD sequences in mRNA driving translation pausing within the ribosome. We examined this phenomenon to determine if a code is involved, with the inherent redundancy of the genetic code. We have examined this data in detail and will show that it exhibits the properties of a code that is used to allow for protein folding. We will show that this code resides in the same ontological prescriptive information (PIo) space as the genetic code used in the protein synthesis process. This dual usage of the same code within the coding regions of genes would normally be controlled semiotically if each codon had only one mapping to its corresponding amino acid. However, the genetic code is known to be redundant, meaning that multiple codons can prescribe the same amino acid. We will show that this redundancy is precisely what allows for the dual functionality of the genetic code to encode simultaneous functions within the same coding space, and using the same string of nucleotides without ambiguity. In doing so, we show why the term �generacy” is completely inappropriate. The dual coding functionality of redundancy is anything but �generate.” It represents, instead, far more sophistication, layers, and dimensions of formal prescription.

We posit that the translation pausing function is enabled by a code that is superimposed upon the genetic code, yet remains distinct and independent from the genetic code. We further posit that the genetic code consist of multi-threads of information co-existing in the same physical space which is made possible by the redundancy of the genetic code itself. To support these propositions we begin by examining the data for aSD hexamer sequences to determine the logic and rules that give it the property of code.

Genetic code redundancy and its influence on the encoded polypeptides

The genetic code is said to be redundant in that the same amino acid residue can be encoded by multiple, so-called synonymous, codons. If all properties of synonymous codons were entirely equivalent, one would expect that they would be equally distributed along protein coding sequences. However, many studies over the last three decades have demonstrated that their distribution is not entirely random. It has been postulated that certain codons may be translated by the ribosome faster than others and thus their non-random distribution dictates how fast the ribosome moves along particular segments of the mRNA. The reasons behind such segmental variability in the rates of protein synthesis, and thus polypeptide emergence from the ribosome, have been explored by theoretical and experimental approaches. Predictions of the relative rates at which particular codons are translated and their impact on the nascent chain have not arrived at unequivocal conclusions. This is probably due, at least in part, to variation in the basis for classification of codons as "fast" or "slow", as well as variability in the number and types of genes and proteins analyzed. Recent methodological advances have allowed nucleotide-resolution studies of ribosome residency times in entire transcriptomes, which confirm the non-uniform movement of ribosomes along mRNAs and shed light on the actual determinants of rate control. Moreover, experiments have begun to emerge that systematically examine the influence of variations in ribosomal movement and the fate of the emerging polypeptide chain.


Amoeba Sisters. (2019, September 17). How to read a codon chart. YouTube.

Bozeman Science. (2012, September 15). Comparing DNA sequences. YouTube.

Wikipedia contributors. (2020, July 2). Marshall Warren Nirenberg. In Wikipedia.

The smallest unit of life, consisting of at least a membrane, cytoplasm, and genetic material.

A nucleic acid of which many different kinds are now known, including messenger RNA, transfer RNA and ribosomal RNA.

A sequence of 3 DNA or RNA nucleotides that corresponds with a specific amino acid or stop signal during protein synthesis.

Amino acids are organic compounds that combine to form proteins.

The specific location in DNA where a set of codons will code for a certain protein. The reading frame begins with the start codon (AUG).

Deoxyribonucleic acid - the molecule carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses.

A large family of RNA molecules that convey genetic information from DNA to the ribosome, where they specify the amino acid sequence of the protein products of gene expression.


There are many theories behind the origin of genetic codes. The genetic code used by all known forms of life is nearly universal. However, there are a huge number of possible genetic codes. If amino acids are randomly associated with triplet codons, there will be 1.5 x 10 84 possible genetic codes. Phylogenetic analysis of transfer RNA suggests that tRNA molecules evolved before the present set of aminoacyl-tRNA synthetases.

Theoretically the genetic code could be completely random (a "frozen accident"), completely non-random (optimal) or a combination of random and nonrandom. There are sufficient data to refute the first possibility. For a start, a quick view on the table of the genetic code already shows a clustering of amino acid assignments. Furthermore, amino acids that share the same biosynthetic pathway tend to have the same first base in their codons, and amino acids with similar physical properties tend to have similar codons.

There are four themes running through the many theories that seek to explain the evolution of the genetic code (and hence the origin of these patterns):

1. Chemical principles govern specific RNA interaction with amino acids. Aptamer experiments showed that some amino acids have a selective chemical affinity for the base triplets that code for them. Recent experiments show that of the 8 amino acids tested, 6 show some RNA triplet-amino acid association. This has been called the stereochemical code. The stereochemical code could have created an ancient core of assignments. The current complex translation mechanism involving tRNA and associated enzymes may be a later development, and that originally, protein sequences were directly templated on base sequences.

2. Biosynthetic expansion. The standard modern genetic code grew from a simpler earlier code through a process of "biosynthetic expansion". Here the idea is that primordial life "discovered" new amino acids (e.g., as by-products of metabolism) and later back-incorporated some of these into the machinery of genetic coding. Although much circumstantial evidence has been found to suggest that fewer different amino acids were used in the past than today, precise and detailed hypotheses about exactly which amino acids entered the code in exactly what order have proved far more controversial.

3. Natural selection has led to codon assignments of the genetic code that minimize the effects of mutations. A recent hypothesis suggests that the triplet code was derived from codes that used longer than triplet codons. Longer than triplet decoding has higher degree of codon redundancy and is more error resistant than the triplet decoding. This feature could allow accurate decoding in the absence of highly complex translational machinery such as the ribosome.

4. Information channels: Information-theoretic approaches see the genetic code as an error-prone information channel. The inherent noise (i.e. errors) in the channel poses the organism with a fundamental question: how to construct a genetic code that can withstand the impact of noise while accurately and efficiently translating information? These “rate-distortion” models suggest that the genetic code originated as a result of the interplay of the three conflicting evolutionary forces: the needs for diverse amino-acids, for error-tolerance and for minimal cost of resources. The code emerges at a coding transition when the mapping of codons to amino-acids becomes nonrandom. The emergence of the code is governed by the topology defined by the probable errors and is related to the map coloring problem.

Ribonucleic acid (RNA) with two repeating units (UCUCUCU → UCU CUC UCU) produced two alternating amino acids. This, combined with the Nirenberg and Leder experiment, showed that UCU codes for Serine and CUC codes for Leucine. RNAs with three repeating units (UACUACUA → UAC UAC UAC, or ACU ACU ACU, or CUA CUA CUA) produced three different strings of amino acids. RNAs with four repeating units including UAG, UAA, or UGA, produced only dipeptides and tripeptides thus revealing that UAG, UAA and UGA are stop codons. With this, Khorana and his team had established that the mother of all codes, the biological language common to all living organisms, is spelled out in three-letter words: each set of three nucleotides codes for a specific amino acid. Their Nobel lecture was delivered on December 12, 1968. To do this Khorana was also the first to synthesize oligonucleotides, that is, strings of nucleotides.

The table of Genetic Code Edit

2nd base
1st base T TTT (Phe/F) Phenylalanine TCT (Ser/S) Serine TAT (Tyr/Y) Tyrosine TGT (Cys/C) Cysteine
TTC (Phe/F) Phenylalanine TCC (Ser/S) Serine TAC (Tyr/Y) Tyrosine TGC (Cys/C) Cysteine
TTA (Leu/L) Leucine TCA (Ser/S) Serine TAA Ochre (Stop) TGA Opal (Stop)
TTG (Leu/L) Leucine TCG (Ser/S) Serine TAG Amber (Stop) TGG (Trp/W) Tryptophan
C CTT (Leu/L) Leucine CCT (Pro/P) Proline CAT (His/H) Histidine CGT (Arg/R) Arginine
CTC (Leu/L) Leucine CCC (Pro/P) Proline CAC (His/H) Histidine CGC (Arg/R) Arginine
CTA (Leu/L) Leucine CCA (Pro/P) Proline CAA (Gln/Q) Glutamine CGA (Arg/R) Arginine
CTG (Leu/L) Leucine CCG (Pro/P) Proline CAG (Gln/Q) Glutamine CGG (Arg/R) Arginine
A ATT (Ile/I) Isoleucine ACT (Thr/T) Threonine AAT (Asn/N) Asparagine AGT (Ser/S) Serine
ATC (Ile/I) Isoleucine ACC (Thr/T) Threonine AAC (Asn/N) Asparagine AGC (Ser/S) Serine
ATA (Ile/I) Isoleucine ACA (Thr/T) Threonine AAA (Lys/K) Lysine AGA (Arg/R) Arginine
ATG (Met/M) Methionine ACG (Thr/T) Threonine AAG (Lys/K) Lysine AGG (Arg/R) Arginine
G GTT (Val/V) Valine GCT (Ala/A) Alanine GAT (Asp/D) Aspartic acid GGT (Gly/G) Glycine
GTC (Val/V) Valine GCC (Ala/A) Alanine GAC (Asp/D) Aspartic acid GGC (Gly/G) Glycine
GTA (Val/V) Valine GCA (Ala/A) Alanine GAA (Glu/E) Glutamic acid GGA (Gly/G) Glycine
GTG (Val/V) Valine GCG (Ala/A) Alanine GAG (Glu/E) Glutamic acid GGG (Gly/G) Glycine
nonpolar polar basic acidic (stop codon)

Degeneracy is the redundancy of the genetic code. The genetic code has redundancy but no ambiguity ( above for the full correlation). For example, although codons GAA and GAG both specify glutamic acid (redundancy), neither of them specifies any other amino acid (no ambiguity). The codons encoding one amino acid may differ in any of their three positions. For example the amino acid glutamic acid is specified by GAA and GAG codons (difference in the third position), the amino acid leucine is specified by UUA, UUG, CUU, CUC, CUA, CUG codons (difference in the first or third position), while the amino acid serine is specified by UCA, UCG, UCC, UCU, AGU, AGC (difference in the first, second or third position).

A position of a codon is said to be a fourfold degenerate site if any nucleotide at this position specifies the same amino acid. For example, the third position of the glycine codons (GGA, GGG, GGC, GGU) is a fourfold degenerate site, because all nucleotide substitutions at this site are synonymous i.e., they do not change the amino acid. Only the third positions of some codons may be fourfold degenerate. A position of a codon is said to be a twofold degenerate site if only two of four possible nucleotides at this position specify the same amino acid. For example, the third position of the glutamic acid codons (GAA, GAG) is a twofold degenerate site. In twofold degenerate sites, the equivalent nucleotides are always either two purines (A/G) or two pyrimidines (C/U), so only transversional substitutions (purine to pyrimidine or pyrimidine to purine) in twofold degenerate sites are nonsynonymous.

A position of a codon is said to be a non-degenerate site if any mutation at this position results in amino acid substitution. There is only one threefold degenerate site where changing to three of the four nucleotides may have no effect on the amino acid (depending on what it is changed to), while changing to the fourth possible nucleotide always results in an amino acid substitution. This is the third position of an isoleucine codon: AUU, AUC, or AUA all encode isoleucine, but AUG encodes methionine. In computation this position is often treated as a twofold degenerate site.

There are three amino acids encoded by six different codons: serine, leucine, and arginine. Only two amino acids are specified by a single codon. One of these is the amino-acid methionine, specified by the codon AUG, which also specifies the start of translation the other is tryptophan, specified by the codon UGG. The degeneracy of the genetic code is what accounts for the existence of synonymous mutations.

Degeneracy results because there are more codons than encodable amino acids. For example, if there were two bases per codon, then only 16 amino acids could be coded for (4²=16). Because at least 21 codes are required (20 amino acids plus stop), and the next largest number of bases is three, then 4³ gives 64 possible codons, meaning that some degeneracy must exist.

These properties of the genetic code make it more fault-tolerant for point mutations. For example, in theory, fourfold degenerate codons can tolerate any point mutation at the third position, although codon usage bias restricts this in practice in many organisms twofold degenerate codons can tolerate one out of the three possible point mutations at the third position. Since transition mutations (purine to purine or pyrimidine to pyrimidine mutations) are more likely than transversion (purine to pyrimidine or vice-versa) mutations, the equivalence of purines or that of pyrimidines at twofold degenerate sites adds a further fault-tolerance.

Despite the redundancy of the genetic code, single point mutations can still cause dysfunctional proteins. For example, a mutated hemoglobin gene causes sickle-cell disease. In the mutant hemoglobin a hydrophilic glutamate (Glu) is substituted by the hydrophobic valine (Val), that is, GAA or GAG becomes GUA or GUG. The substitution of glutamate by valine reduces the solubility of Beta globulins|β-globin which causes hemoglobin to form linear polymers linked by the hydrophobic interaction between the valine groups, causing sickle-cell deformation of erythrocytes. Sickle-cell disease is generally not caused by a de novo mutation. Rather it is selected for in geographic regions where malaria is common (in a way similar to thalassemia), as heterozygous people have some resistance to the malarial Plasmodium parasite (heterozygote advantage). [5]

These variable codes for amino acids are allowed because of modified bases in the first base of the anticodon of the tRNA, and the base-pair formed is called a wobble base pair. The modified bases include inosine and the Non-Watson-Crick U-G basepair. [6]

Initiation or Start Codon Edit

The start codon is generally defined as the point, sequence, at which a ribosome begins to translate a sequence of RNA into amino acids. When an RNA transcript is "read" from the 5' carbon to the 3' carbon by the ribosome the start codon is the first codon on which the tRNA bound to Met, methionine, and ribosomal subunits attach. ATG and AUG denote sequences of DNA and RNA, respectively, that are the start codon or initiation codon encoding the amino acid methionine (Met) in eukaryotes and a modified Met (fMet) in prokaryotes. The principle called the Central dogma of molecular biology describes the process of translation of a gene to a protein. Specific sequences of DNA act as a template to synthesize mRNA in a process termed "transcription" in the nucleus. This mRNA is exported from the nucleus into the cytoplasm of the cell and acts as a template to synthesize protein in a process called "translation." Three nucleotide bases specify one amino acid in the genetic code, a mapping encoded in the tRNA of the organism. The first three bases of the coding sequence (CDS) of mRNA to be translated into protein are called a start codon or initiation codon. The start codon is almost always preceded by an untranslated region 5' UTR. The start codon is typically AUG (or ATG in DNA this also encodes methionine). Very rarely in higher organisms (eukaryotes) are non AUG start codons used. In addition to AUG, alternative start codons, mainly GUG and UUG are used in prokaryotes. For example E. coli uses 83% ATG (AUG), 14% GTG (GUG), 3% TTG (UUG) and one or two others (e.g., ATT and CTG).

Termination or Stop codon Edit

In the genetic code, a stop codon (also known as termination codon) is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are based upon polypeptides, which are unique sequences of amino acids and most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide chain, which may ultimately become a protein — stop codons signal the termination of this process, releasing the amino acid chain.

Stop codons were historically given many different names, as they each corresponded to a distinct class of mutants that all behaved in a similar manner. These mutants were first isolated within bacteriophages (T4 and lambda), viruses that infect the bacteria Escherichia coli. Mutations in viral genes weakened their infectious ability, sometimes creating viruses that were able to infect and grow within only certain varieties of E coli.

1. Amber mutations were the first set of nonsense mutations to be discovered. They were isolated by Richard Epstein and Charles Steinberg, but named after their friend Harris Bernstein (see Edgar pgs. 580-581 [7] ) for the story behind this incident)

Viruses with amber mutations are characterized by their ability to infect only certain strains of bacteria, known as amber suppressors. These bacteria carry their own mutation that allow a recovery of function in the mutant viruses. For example, a mutation in the tRNA that recognizes the amber stop codon allows translation to "read through" the codon and produce full-length protein, thereby recovering the normal form of the protein and "suppressing" the amber mutation. Thus, amber mutants are an entire class of virus mutants that can grow in bacteria that contain amber suppressor mutations.

2.Ochre Ochre mutation was the second stop codon mutation to be discovered. Given a color name to match the name of amber mutants, ochre mutant viruses had a similar property in that they recovered infectious ability within certain suppressor strains of bacteria. The set of ochre suppressors was distinct from amber suppressors, so ochre mutants were inferred to correspond to a different nucleotide triplet. Through a series of mutation experiments comparing these mutants with each other and other known amino acid codons, Sydney Brenner concluded that the amber and ochre mutations corresponded to the nucleotide triplets "UAG" and "UAA". [8]

3. Opal mutations or umber mutations the third and last stop codon in the standard genetic code was discovered soon after, corresponding to the nucleotide triplet "UGA". Nonsense mutations that created this premature stop codon were later called opal mutations or umber mutations.

In RNA: UAG ("amber") UAA ("ochre") UGA ("opal")

In DNA: TAG ("amber") TAA ("ochre") TGA ("opal" or "umber").

Exceptions to the Universal Genetic Code (UGC) in mitochondria
Organism Codon Standard Novel
Mammalian AGA, AGG Arginine Stop codon
AUA Isoleucine Methionine
UGA Stop codon Tryptophan
Invertebrates AGA, AGG Arginine Serine
AUA Isoleucine Methionine
UGA Stop codon Tryptophan
Yeast AUA Isoleucine Methionine
UGA Stop codon Tryptophan
CUA Leucine Threonine

Exceptions to the genetic code: Although the vast majority of living organisms today use the standard genetic code, geneticists have discovered a few variations on this code. Moreover, these variants are found in different evolutionary lineages and consist of different translations of a few codons.

The CUG codon, usually translated as leucine , corresponds to the serine 2 in many species of fungi Candida 3 .

Many species of green algae of the genus Acetabularia use stop codons UAG and UAA to encode glycine .

Many ciliates like Paramecium tetraurelia , Tetrahymena thermophila or Stylonychia 4 lemnae use codons UAG and UAA to code for glutamine instead of stop. UGA is the one stop codon used by these cells.

The ciliate Euplotes octocarinatus uses the codon UGA to encode cysteine, leaving UAG and UAA as stop signs.

In the three kingdoms of life , we sometimes find a twenty-first amino acid, selenocysteine , encoded by the UGA codon (normally a stop codon).

In archaea and eubacteria , a twenty-second amino acid, pyrrolysine is sometimes met, encoded by UAG (also usually a stop codon).

The first amino acid incorporated (determined by the start codon AUG) is a methionine in most eukaryotes , more rarely a valine (in some eukaryotes ), and formyl-methionine in most prokaryotes . In addition, this codon is GUG or GUU sometimes in some prokaryotes.

We therefore believe that life today originally had a smaller number of amino acids. These amino acids have been modified and have seen their numbers increase (by a phenomenon similar to the formation of sélénocytéine and pyrrolysine derived from serine and lysine, respectively, modified as they are on their transfer RNA on the ribosome .) These new amino acids were then used a subset of transfer RNAs and their associated coding. Maybe we notice signs of this phenomenon with glutamine , which in some bacteria, derived from glutamate still attached to its tRNA.

Another exception: the code is sometimes ambiguous. For example, the codon UGA is in the same organism ( Escherichia coli , for example) sometimes code for the 21st amino acid mentioned above ( selenocysteine ) or "stop".

Watch the video: Redundancy of the Genetic Code (September 2022).