We are searching data for your request:
Upon completion, a link will appear to access the found materials.
There is some literature which shows that all start codons code for methionine. However, in the standard genetic code, the alternative start codons clearly code for leucine. Does that mean these codons will code for leucine when they are encountered during translation (after start codon has been initialised and translated).
I commented that this was a duplicate, but reading the question more carefully you seem to be asking something slightly different.
In the context of a 'start' these codons will be recognised by fMet-tRNA and a formyl-methionine will be inserted as the first amino acid. Subsequent occurrences of the same codon within the open reading frame will be translated normally (e.g. GUG > valine).
The use of GTG as an initiation codon in the E. coli lacI gene
In this paper
Frottini et al. (2006) The Proteomics of N-terminal Methionine Cleavage. Molec. & Cell. Proteomics 5: 2336-2349
the authors report assays of E. coli methionine aminopeptidase with model peptides showing that when Lys is the 2nd residue, Met removal is very inefficient. They also show that when Pro is the 3rd residue, Met removal is very inefficient.
This explains the fact that when the lacI repressor protein was sequenced:
Beyreuther et al. (1973) The amino-acid sequence of lac repressor. PNAS 70: 3576-3580
it was found to have a Met residue at its N terminus (sequence Met-Lys-Pro-).
However, when the lacI gene was sequenced:
Farabaugh (1978) Sequence of the lacI gene. Nature 274: 765-769
the corresponding DNA sequence was GTG AAA CCA, demonstrating that the N-terminal residue is encoded by a GTG(Val) codon. LacI residue Val23 is encoded by a GTG codon, demonstrating the normal use of that codon in the body of the mRNA.
42 The Genetic Code
To summarize what we know to this point, the cellular process of transcription generates messenger RNA (mRNA), a mobile molecular copy of one or more genes with an alphabet of A, C, G, and uracil (U). Translation of the mRNA template converts nucleotide-based genetic information into a protein product. Protein sequences consist of 20 commonly occurring amino acids therefore, it can be said that the protein alphabet consists of 20 letters. Each amino acid is defined by a three-nucleotide sequence called the triplet codon. The relationship between a nucleotide codon and its corresponding amino acid is called the genetic code.
Given the different numbers of “letters” in the mRNA and protein “alphabets,” combinations of nucleotides corresponded to single amino acids. Using a three-nucleotide code means that there are a total of 64 (4 × 4 × 4) possible combinations therefore, a given amino acid is encoded by more than one nucleotide triplet (Figure 8).
Figure 8: This figure shows the genetic code for translating each nucleotide triplet, or codon, in mRNA into an amino acid or a termination signal in a nascent protein. (credit: modification of work by NIH)
Three of the 64 codons terminate protein synthesis and release the polypeptide from the translation machinery. These triplets are called stop codons. Another codon, AUG, also has a special function. In addition to specifying the amino acid methionine, it also serves as the start codon to initiate translation. The reading frame for translation is set by the AUG start codon near the 5′ end of the mRNA. The genetic code is universal. With a few exceptions, virtually all species use the same genetic code for protein synthesis, which is powerful evidence that all life on Earth shares a common origin.
Characteristic and Exceptions of Genetic Code &ndash Discussed!
There is an intimate connection between genes and synthesis of polypeptides or enzymes. Genes are made up of nucleotides arranged in a specific manner. In modern terminology a gene refers to a cistron of DNA. A cistron is made of a large number of nucleotides.
Arrangement of nucleotides or their nitrogen bases is connected with the synthesis of proteins by influencing the incorporation of amino acids in them. The relationship between the sequence of amino acids in a polypeptide and nucleotide sequence of DNA or mRNA is called genetic code.
DNA contains only four types of nitrogen bases or nucleotides while the number of amino acids is 20. It was, therefore, hypothesised that triplet code (consisting of three adjacent bases for one amino acid) is operative. The different researches which helped in deciphering the triplet genetic code are as follows.
1. Crick et al (1961) observed that deletion or addition of one or two base pairs in DNA of T4 bacteriophage disturbed normal DNA functioning. However, when three base pairs were added or deleted the disturbance was minimum.
2. Nirenberg and Mathaei (1961) argued that a single code (one amino acid specified by one nitrogen base) can specify only 4 acids (4 1 ), a doublet code only 16 (4 2 ) while a triplet code can specify upto 64 amino acids (4 3 ). As there are 20 amino acids, a triplet code (three nitrogen bases for one amino acid) can be operative.
3. Nirenberg (1961) prepared polymers of the four nucleotides UUUUUU…(Polyuridylic acid), CCCCCC…(polycytidylic acid), AAAAAAA…(polyadenylic acid) and GGGGGGG…(polyguanylic acid). He observed that poly-U stimulated the formation of polyphenylalanine, poly-C of polyproline while poly-A helped form polylysine. However poly-G did not function (it formed triple-stranded structure which does not function in translation). Later on, GGG was found to code for amino acid glycine.
Table. Assignment of mRNA codons to Amino Acids.
4. Khorana (1964) synthesised copolymers of nucleotides like UGUGUGUGUG and observed that they stimulated the formation of polypeptides having alternately similar amino acids as cysteine- valine-cysteine. This is possible only if three adjacent nucleotides specify one amino acid (e.g. UGU) and other three the second amino acid (e.g.GUG).
5. The triplet codons were confirmed by in vivo codon assignment through (i) amino acid re­placement studies (ii) frame shift mutations.
6. Slowly all the codons were worked out some amino acids are specified by more than one codon. The code languages of DNA and mRNA are complementary. Thus the two codons for phenyla­lanine are UUU and UUC in case of mRNA while they are AAA and A AG for DNA.
1. Triplet Code:
Three adjacent nitrogen bases constitute a codon which specifies the placement of one amino acid in a polypeptide.
2. Start Signal:
Polypeptide synthesis is signalled by two initiation codons —AUG or methionine codon and GUG or valine codon.
3. Stop Signal:
Polypeptide chain termination is signalled by three termination codons UAA (ochre), UAG (amber) and UGA (opal). They do not specify any amino acid and are hence also called nonsense codons.
4. Universal Code:
The genetic code is applicable universally i.e., a codon specifies the same amino acid from a virus to a tree or human being. Thus mRNA from chick oviduct introduced in Escherichia coli produces ovalbumen in the bacterium exactly similar to one formed in chick.
5. Nonambiguous Codons:
One codon specifies only one amino acid and not any other.
6. Related Codons:
Amino acids with similar properties have related codons, e.g. aromatic amino acids tryptophan (UGG), phenylalanine (UUC, UUU), tyrosine (UAC, UAU).
The genetic code is continuous and does not possess pauses after the triplets. If a nucleotide is deleted or added, the whole genetic code will read differently. Thus a polypeptide having 50 amino acids shall be specified by a linear sequence of 150 nucleotides. If a nucleotide is added or deleted in the middle of this sequence, the first 25 amino acids of polypeptide will be same but next 25 amino acids will be quite different.
8. Non-overlapping Code:
A nitrogen base is a constituent of only one codon.
9. Degeneracy of Code:
Since there are 64 triplet codons and only 20 amino acids, incorporation of some amino acids must be influenced by more than one codon. Only tryptophan (UGG) and methionine (AUG) are specified by single codons.
All other amino acids are specified by 2-6 codons. The latter are called degenerate codons. In degenerate codons the first two nitrogen bases are similar while the third one is different. As the third nitrogen base has no effect on coding, the same is called wobble position.
Both polypeptide and DNA or mRNA have a linear arrangement of their components. Further, the sequence of triplet nucleotide bases in DNA or mRNA corresponds to the sequence of amino acids in the polypeptide manufactured under the guidance of the former. Change in codon sequence also produces a similar change in amino acid sequence of polypeptide.
11. Cistron-Polypeptide Parity:
Portion of DNA called cistron (=gene) specifies the formation of a particular polypeptide. It means that genetic system should have as many cistrons (= genes) as the types of polypeptides found in the organisms.
1. Different Codons:
In Paramecium and some other ciliates termination codons UAA and UGA code for glutamine.
2. Overlapping Genes:
ф xl74 has 5375 nucleotides that code for 10 proteins which require more than 6000 bases. Three of its genes E, B and K overlap other genes. Nucleotide sequence at the beginning of E gene is contained within gene D. Likewise gene K overlaps with genes A and C.A similar condition is found in SV-40.
3. Mitochondrial Genes:
AGG and AGA code for arginine but function as stop signals in human mitochondrion. UGA, a termination codon, corresponds to tryptophan while AUA (codon for isoleucine) denotes methionine in human mitochondria.
Transcription in Eukaryotic Cells
Transcription is more complex in eukaryotic cells than in those that are prokaryotic. Activator proteins bind to genes known as enhancers which help determine which genes are switched on and speed up transcription. Repressor proteins bind to genes called silencers which interfere with activator proteins and slow down transcription. Coactivators, adapter molecules which coordinate signals from activator and repressor proteins, relay this information to basal factors which then position RNA polymerase at the start of the coding region of the gene to begin transcription.
Once the actual transcription begins, ribonucleotides containing 3 phosphate groups form hydrogen bonds through the process of complementary base pairing with the exposed deoxyribonucleotides on the unwound strand that is to be transcribed. The ribonucleotides are then covalently bonded together by phosphodiester bonds, the energy being supplied by the cleavage of two phosphate groups from the ribonucleotide triphosphate (Figure (PageIndex<16>).8B.5). (The phosphodiester bond refers to the phosphate on the 5'C of the newly inserted nucleotide covalently bonding to the 3'C of the last ribonucleotide in the mRNA chain.)
Figure (PageIndex<18>).6B.5: Transcription of mRNA Complementary to DNA. RNA is synthesized by complementary base pairing of free ribonucleotides with the deoxyribonucleotides of a gene. The ribonucleotides are then covalently bonded together by phosphodiester bonds, the energy being supplied by the cleavage of two phosphate groups from the ribonucleotide triphosphate.The enzyme responsible for transcription is RNA polymerase (not shown here)
Unlike prokaryotes, most genes in higher eukaryotic cells contain large amounts - as much as 98% in the human genome - of regionscalled introns that are not part of the code for the final protein. These are interspersed among the coding regions or exons that actually code for the final protein.
RNA polymerase copies both the exons and the introns to form what is called precursor mRNA or pre-mRNA. Early in transcription, a cap in the form of an unusual nucleotide, 7-methylguanylate, is added to the 5' end of the pre-mRNA. This cap helps ribosomes attach for translation. As transcription is nearly completed, a series of 100-250 adenine ribonucleotides called a poly-A tail is added to the 3' end of the pre-mRNA. This poly-A tail is thought to help transport the mRNA out of the nucleus and may stabilize the mRNA against degradation in the cytoplasm. After transcription of the precursor mRNA, non-protein coding regions (introns) are excised and coding regions (exons) are joined together by complexes of ribonucleoproteins called spliceosomes to produce what is termed mature mRNA as shown in Figure (PageIndex<9>). This process is called RNA processing.
The mature mRNA then passes through the pores in the nuclear membrane to be translated into protein by tRNA on eukaryotic 80S ribosomes (composed of 60S and 40S subunits) in a manner similar to prokaryotes.
The mRNA molecule is divided up into codons. A codon is a series of three consecutive mRNA bases coding for one specific amino acid. The various codons and the amino acids for which they code are shown in Figure (PageIndex<8>). There are 64 codons. One codon, AUG, also serves as a start codon to initiate translation, and three codons, UAG, UAA, and UGA, function as stop or nonsense codons to terminate translation. (Alternative start codons are different from the standard AUG codon and are found occasionally in both prokaryotes and eukaryotes.)
In addition to the genes that are transcribed into mRNA to be translated into polypeptides and proteins, there are also specific genes in the DNA from which each of the different transfer RNAs (tRNAs) and the ribosomal RNAs (rRNAs) are transcribed.
Once transcribed, the mRNA can be translated into protein.
As mentioned above, introns make up the majority of DNA in higher eukaryotic cells and for decades was considered to be "junk DNA" accumulated over millions of years of evolution. Over recent years however, it has been discovered that much of this intergenic DNA, although it does not code for protein synthesis, is transcribed into functional molecules of RNA with names such as antisense RNA microRNA, and riboswitch RNA that play important roles in whether or not a protein is actually made.
Antisense RNA is RNA transcribed off of the strand of DNA complementary to the one being transcribed into mRNA. In other words, it is an RNA molecule complementary to a mRNA and as such may complementary base pair with the mRNA and prevents it from being translated into protein.
MicroRNA, often transcribed from intron DNA, folds over upon itself to resemble double-stranded RNA, a form of RNA produced by many viruses during their life cycle. Viral double-stranded RNA activates a host defense mechanism that degrades that viral RNA. The MicroRNA frequently binds to mRNA and tricks this defense mechanism into degrading that mRNA so it can not be translated into protein.
Riboswitch RNA, often transcribed from introns, exists in an inactive form until a specific target chemical binds. The binding of the target chemical turns the riboswitch RNA to an active form that can be translated into a specific protein.
What does a start and stop codon do?
It all has to do with how DNA stores the information to form proteins. A quick overview:
DNA are long molecules, you have about 2 to 3 meters in one cell. Only a tiny piece of this DNA is transcribed when a protein has to be produced. So the enzymes that do this have to know where a gene starts and ends, that is where the start and stop codons come in.
Every amino acid has its own specific codons = 3 bases of DNA/mRNA. The start codon always has the code AUG in mRNA and codes for the amino acid methionine. This is the signal where enzymes start transcription.
There are several stop codons (UAA, UAG and UGA) these do not code for an amino acid but only act as a signal for the enzyme to stop transcription.
So inbetween the start en stop codon is de coding region of a gene that is translated into a protein
The importance of ribosomal RNA
The ribosome itself is made of both ribosomal proteins and rRNA (ribosomal RNA). In a Perspective from 2000 titled "The Ribosome is a Ribozyme," (https://science.sciencemag.org/content/289/5481/878) Thomas Cech describes how the crystal structure of the ribosome advanced our understanding of the roles of the rRNA and protein components of this critical complex. The crystal structure of the large ribosomal subunit from the archean Haloarcula marismortui at high resolution was published in 2000 (https://science.sciencemag.org/content/289/5481/905 Ban et al.). Venkatraman Ramakrishnan, Thomas A. Steitz and Ada E. Yonath later shared the 2009 Nobel Prize in Chemistry for their contributions to our understanding of this structure. This structure revealed that the active site, where the peptide bond forms between the amino acids in the P and A sites, is largely not proteins, which most cellular catalysts are, but instead ribosomal RNA. A ribozyme is the term for an RNA that can catalyze an enzymatic reaction. Therefore, the ribosome is a ribozyme.
Structure of the 50S ribosomal subunit assembly (intermediate state 1) from E. coli K-12 (https://www.rcsb.org/structure/6GC7) with ribosomal RNA in orange and proteins in other colors.
Fig. 15 Detailed diagram of the organization of the beginning of a gene (strand direction descriptors follow the NCBI convention). Note that "positive" and "negative" strand designations are relative to the p arm of the chromosome, while the "sense" strand designation is relative to the orientation of a particular gene.
A particular strand of DNA could be divided into codons three different ways depending on the starting position from which the nucleotide triples are demarcated. Since genes can be coded for on either of the complementary strands, a double-stranded piece of DNA can thus have a total of six different frames of reference for demarcating codons. Each of these six is called a reading frame.
All DNA sequences coding for proteins begin with the same codon, ATG, which codes for the amino acid methionine. This codon is also known as the start codon. In Fig. 15, the ATG sequence on the negative strand demarcates the beginning of a coding sequence. (Since the gene in the example is coded on the negative strand, for that gene the negative strand is the sense strand.) The reading frame which contains that methionine is therefore the one correct reading frame (out of the possible six), which codes for the protein. The other five reading frames are essentially random gibberish. Methionine will appear at the beginning of all genes since it is the only codon used to signal the start of protein coding, but not all methionines in a sequence are start codons. In addition to its role in the initiation of translation, methionine is also simply an amino acid equivalent to the other 19 amino acids.
Because of their random nature, the other five possible reading frames derived from a DNA sequence will often contain stop codons that are generated by chance. If those reading frames actually coded for proteins, the stop codons would indicate to the ribosomes that they should stop adding amino acids to the polypeptide. Since it would not make sense biologically for translation to stop after so short a time, the presence of many stop codons can serve as an indication that a particular reading frame does not legitimately code for a protein sequence. In a DNA sequence composed of a random series of nucleotides, stop codons should occur by chance on average every 21 codons (4 3 possible combinations of three nucleotides divided by the three possible stop codons). Therefore it is unlikely that any reading frame would continue for a distance of much longer than 21 codons without being interrupted by a stop codon unless it actually coded for a protein. A segment of DNA that contains a reading frame in which a long sequence is not interrupted by any stop codon is called an open reading frame (ORF). Therefore, the first step in searching a new genome sequence for unknown protein-coding genes is usually to identify ORFs. Note: In order for a segment of DNA to be considered an ORF, the stretch of DNA lacking a stop codon must be much longer than what would be likely to occur by chance in a random sequence of nucleotides (i.e. many more than 21 codons= hundreds of nucleotides).
You should be aware that there are genes that do not code for proteins. Most notably, these are genes that code for non-messenger RNAs (i.e. transfer RNA=tRNA and ribosomal RNA=rRNA). In the case of RNA coding genes, the RNA transcript is the final structural product and is not subsequently translated. Thus tRNA and rRNA genes do not contain codons, nor does the term reading frame have any relevance to them.
Cracking the genetic code
After the structure of DNA was deciphered by James Watson, Francis Crick and Rosalind Franklin, serious efforts to understand the nature of the encoding of proteins began. George Gamov postulated that a three-letter code must be employed to encode the 20 different amino acids used by living cells to encode proteins. The first elucidation of a codon was done by Marshall Nirenberg and Heinrich J. Matthaei in 1961 at the National Institutes of Health. They used a cell-free system to translate a poly-uracil RNA sequence (or UUUUU. in biochemical terms) and discovered that the polypeptide they had synthesized consisted of only the amino acid phenylalanine. They, thereby deduced from this poly-phenylalanine that the codon UUU specified the amino-acid phenylalanine. Extending this work, Nirenberg and his coworkers were able to determine the nucleotide makeup of each codon. In order to determine the order of the sequence, trinucleotides were bound to ribosomes and radioactivaly labeled aminoacyl-tRNA was used to determine, which amino acid corresponded to the codon. Nirenberg's group was able to determine the the sequences of 54 out of 64 codons. Subsequent work by Har Gobind Khorana identified the rest of the code, and shortly thereafter Robert W. Holley determined the structure transfer RNA, the adapter molecule that facilitates translation. In 1968, Khorana, Holley and Nirenberg shared the Nobel Prize in Physiology or Medicine for their work.
Why Are Start and Stop Codons Important?
Start and stop codons are important because they tell the cell machinery where to begin and end translation, the process of making a protein. The start codon also sets up the reading frame of the DNA strand, indicating that each triplet after that point codes for a specific amino acid.
Start and stop codons are found both on the original DNA strand in the nucleus of the cell and on the messenger RNA strand that serves as the protein template. The mRNA that corresponds to a specific gene on the DNA strand is synthesized in the nucleus using the antisense strand of DNA as a guide to the order of codons. This mRNA strand then travels to a ribosome in the cell nucleus, where protein assembly takes place.
In most organisms, the only start codon is ATG, a triplet made up of the DNA bases adenine, guanine and thymine. ATG also codes for the amino acid methionine when found in the middle of a gene. In the mRNA template, ATG is replaced by AUG because the base uracil always appears in place of thymine in RNA.
Stop codons come in three different forms: TGA, TAG and TAA. In RNA, these three codons appear as UGA, UAG and UAA. Unlike the start codon, none of the stop codons code for an amino acid.
These results support the idea that the features of 5′ UTRs in most multicellular, and probably a wide range of unicellular, eukaryotes are largely dictated by random genetic drift and mutational processes that cause stochastic turnover in transcription-initiation sites and premature start codons. Under the simplest model that we present, natural selection only indirectly influences the lengths of UTRs through the mutational origin of premature initiation codons within the UTR. If this hypothesis is correct, selection for gene-specific regulatory features need not be invoked to explain the 1,000-fold range of 5′-UTR lengths among genes within species. The broad distribution of UTR lengths with a long tail to the right ( fig. 1) is expected to arise via mutational processes alone, and the observed within-species CVs in UTR lengths are also consistent with the robust theoretical prediction of 1.2–1.3.
An attractive feature of the proposed theory is the insensitivity of the models predictions to the actual length of TISs, which might vary among species. Most notably, once n exceeds 5, there is a near invariance in the steady-state distribution of 5′-UTR lengths predicted by the model, despite the fact that the average distance between random TISs is an increasing function of n. This asymptotic behavior results from the strong barrier to upstream movement of TISs imposed by the neutral accumulation of potentially harmful upstream PSCs as well as by the vulnerability of overly long UTRs to the mutational elimination by the appearance of PSCs within their confines.
The one incompatibility between the data and the predicted 5′-UTR length distribution is the shift of the observed distributions to the right by ∼30–50 nt, which suggests that the very shortest size classes may be selected against by forces other than PSCs. As discussed above, some such selection is expected to result from stochastic variation in points of transcription initiation operating at the cellular level, which would have deleterious consequences for genes with TISs close enough to the translation-start codon to occasionally initiate transcription beyond the translation-start site.
A second potential complication not incorporated into the model concerns the presence of (external) introns in the 5′ UTR, which may inhibit the mutational production of viable alleles with short UTRs. As noted above, the accumulation of PSCs may render such introns exceptionally stable. In the event that a mutational event produces a TIS within a 5′-UTR intron that does not contain a harmful downstream PSC, a new successful allele will be produced, with the upstream portion of the 5′ UTR of the ancestral gene being eliminated and the downstream portion of the intron acquiring the new TIS being incorporated. Given the average sizes of eukaryotic 5′ UTRs (this paper) and the average sizes of introns ( Lynch and Conery 2003), the net effect of these potential changes will often be an overall increase in the length of the UTR. The mere presence of introns may also inhibit the evolution of short 5′ UTRs for purely structural reasons, as there appears to be a minimal exon size essential for efficient splicing ( Sterner, Carlo, and Berget 1996).
Because short TISs magnify the chances of the transcriptional apparatus being subverted to an inappropriate (false positive) site, TISs of at least moderate complexity would seem to be required for efficient transcription. For both eubacterial and archaeal genomes, the efficiency of natural selection is sufficient to maintain the number of spurious core promoters at levels below random expectations ( Hahn, Stajich, and Wray 2003). That inappropriate utilization of random TISs actually occurs in eukaryotes is suggested by the fact that ∼25% of human cDNAs contain no obvious open reading frame and are largely derived from AT-rich genomic regions likely to harbor spurious TATA sequences ( Ota et al. 2004). Nearly 10 times more genomic DNA is transcribed than can be accounted for by known exons ( Kapranov et al. 2002), and similar observations have been made in Giardia ( Elmendorf, Singer, and Nash 2001). Many of these noncoding RNAs are from the antisense strand, overlap the exons of coding DNA ( Cawley et al. 2004), and could have a function, but that remains to be determined.
This being said, n need not be very large to minimize the chances of false positives. Under the assumption of equal nucleotide frequencies, the expected distance between random sequences of n nucleotides is 4 n bp, so a genome 10 8 bp in length (e.g., an average invertebrate) would contain ∼24,400 TISs of length n = 6, ∼1,530 of length n = 8, and only ∼95 of length n = 10. Thus, because the number of genes per eukaryotic genome usually exceeds 10 4 , the length of a TIS need not be much greater than eight to insure that nearly all such sequences are actively maintained by selection in the vicinity of functional genes. Any further increase in n would impose a higher mutation rate to defective alleles without providing any obvious benefits in terms of transcriptional efficiency. With μ being ∼10 8 per generation ( Denver et al. 2004), and ∼15,000 genes in a typical eukaryotic genome, the preceding results suggest that just ∼0.2% of gametes would carry a new 5′-UTR associated null allele at some locus if n = 8. This level of null mutation is easily accommodated by existing estimates of the gametic lethal mutation rate of ∼1.0% ( Lynch and Walsh 1998 Fry et al. 1999).
Although the variation among phylogenetic groups in the average lengths of 5′-UTR lengths is exceptionally small relative to other genomic attributes, some significant lineage-specific differences may exist (it should be noted though that the standard errors in table 1 are not corrected for phylogenetic nonindependence). A number of second-order effects could be responsible for such differences. First, in deriving the theoretical expectations, it was assumed that all 4 nt are equally likely, whereas most eukaryotic genomes deviate somewhat from these conditions. However, although the average lengths of 5′ UTRs may be expected to scale negatively with the random expected frequencies of TATA and ATG sequences within UTR regions, no such relationship exists in the observed data (r 2 = 0.08 and 0.01, respectively). Second, for reasons discussed above, lineage-specific variation in the frequency and/or numbers of external introns could impose different constraints on the indirect selective pressures toward shorter UTRs. Third, the model developed above considers only nucleotide-substitution mutations, whereas significant interspecific differences may exist in rates of deletion and/or insertion ( Petrov et al. 2000).
In summary, our empirical and theoretical results support the idea that the reduction in Ne that accompanied the evolution of eukaryotes, particularly multicellular species, produced a population-genetic environment conducive to the movement of TISs to random positions, subject only to the constraint imposed by the stochastic mutational production of premature start codons. Some microbial eukaryotes may have large-enough effective population sizes to selectively maintain average 5′-UTR lengths below the expectations under effective neutrality ( Ghosh et al. 1994 Singh et al. 1997 Liston and Johnson 1999 Yee et al. 2000 Adam 2001), with virtually all such species exhibiting other genomic hallmarks of large Ne, including small sizes and numbers of introns and a low incidence of mobile genetic elements ( Lynch and Conery 2003). However, essentially all multicellular species may have an Ne insufficiently large to prevent the physical expansion of 5′ UTRs by nonadaptive mechanisms. If this hypothesis is correct, selection for gene-specific regulatory features need not be invoked to explain the expansion of eukaryotic 5′ UTRs relative to the situation in prokaryotes. Nevertheless, once permanently established, expanded 5′ UTRs may have provided novel substrate for the evolution of mechanisms for posttranscriptional regulation of eukaryotic gene expression, providing still another example of how a reduction in Ne can passively promote the evolution of novel forms of gene architecture that ultimately facilitate the evolution of organismal complexity ( Lynch et al. 2001 Lynch 2002 Lynch and Conery 2003).
Even here, there is room for caution. Although many structural features of 5′ UTRs (including their lengths) are known to influence the rate of protein synthesis by modifying the efficiency of translation, prior to accepting natural selection as the underlying explanation for such features, further consideration of semineutral processes may prove worthwhile. For example, upstream open-reading frames (uORFs) can slow the rate of translation by causing the ribosome to terminate and/or reinitiate, and secondary UTR structure and/or internal ribosome entry sites may have similar indirect roles. A number of authors have suggested that uORFs serve an adaptive function ( Morris and Geballe 2000 Meijer and Thomas 2002 Vilela and McCarthy 2003). However, uORFs are generally on the order of 20 codons in length, approximately what is expected by chance, and transcripts from some uORF-containing genes are subject to degradation by the nonsense-mediated decay pathway ( Ruiz-Echevarria and Peltz 2000). In principle, a number of uORFs may simply exist because their stop codons have neutralized the effects of a PSC, enabling their carrier alleles to perform at normal levels.