Information

What is the advantage of circular genomes for bacteria and linear genomes for other organisms?

What is the advantage of circular genomes for bacteria and linear genomes for other organisms?



We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Bacterial are a great group of organisms. They have circular genomes and never went toward linear genomes while other organisms show the opposite strategy and don't have circular genomes (disregarding their cytoplasmic genome). Why they have followed these different strategies?


You can package linear genomes much more efficiently than circular genomes, and bacteria simply don't require the information density to be prosperous.

To be a bit more specific, it's the torque strain put on the double-helix while it's being wound that makes the difference. Linear genomes can be wound around Histones, and these Histones can be further formed into more complex and dense patterns so that the DNA of a linear genome is condensed to a mere fraction of the size it was before. The high density allows cell division to occur much, much easier than it would otherwise happen in Eukaryotic cells.

Circular genomes cannot get rid of the torque stresses like linear genomes can. To attempt a similar packaging scheme as Eukaryotic cells would result in the DNA breaking apart at some point before you reached a high level of information density.

I'll look through my BioChem books again to find a reference later on, but hopefully you should have a solid idea of "why" from the above. For now you can do a little thought experiment: Think of all the ways you can wind a single piece of string or thread around things to make it as dense as possible. Then think of all the ways you can try to achieve the same density with a rubber band without breaking it. The string should win out in both the ease at which you can manipulate it, and the level of density you can achieve.


Bacterial Genomes

All living organisms contain DNA. This amazing macromolecule encodes all of the information needed to program the cell's activities including reproduction, metabolism and other specialized functions. DNA is comprised of two strands of deoxynucleotides. Each deoxynucleotide contains a phosphate, a 5-carbon sugar (2-deoxyribose) and one of four nitrogenous bases: adenine, cytosine, thymine or guanine. The phosphate and sugar make up the backbone of each strand of DNA, while the bases are responsible for holding the two strands together via hydrogen bonds in a structure called the double helix (see figure). The order of the bases in a DNA strand contains the coded genetic information. All of the DNA found in an organism is collectively referred to as the genome. The human genome is comprised of 23 pairs of linear chromosomes, and approximately 3000 megabases (Mb) of DNA, while the genome of the bacterium Escherichia coli consists of a single 4.6 Mb circular chromosome. By studying the genomes of bacteria we are able to better understand their metabolic capabilities, their ability to cause disease and also their capacity to survive in extreme environments.Many of the well-studied bacterial model organisms, such as E. coli, have a single circular chromosome. However, advances in molecular genetics have shown that bacteria possess more complex arrangements of their genetic material than just a single circular chromosome per cell. Some bacterial genomes are comprised of multiple chromosomes and/or plasmids and many bacteria harbor multiple copies of their genome per cell. The following are a few examples of bacteria with unusual genomes.


What is Linear DNA?

Linear DNA is present in the eukaryotic genomes within the cell nucleus. The linear DNA is composed of two free ends, and therefore it is an open structure. Linear DNA can be isolated and separated on agarose gel media, although due to the bulkiness of the DNA, a smear would be observed on the gel. In order to isolate and separate desired fragments of linear DNA, the DNA can be cut using restriction endonucleases and then observed on a gel run.

Figure 01: Linear DNA

Replication process of linear DNA is a much complex process as it involved a lot of mechanisms. The replication takes place in a bidirectional manner, where two replication forks are formed. Linear DNA may contain many origins of replication sites, as the linear DNA is much long and complex. Replication procedure continues until the termination takes place upon the solving of the end termination problem as linear DNA is composed of telomeric sequences.


What is Eukaryotic Genome?

Eukaryotes are the organism that has a nucleus and membrane-bound cell organelles. They have extensive cellular compartments that carry out distinct functions. Within the nucleus of eukaryotes, we can find the eukaryotic genome that contains the whole genetic information of the organism. Mainly, eukaryotic genome exists as linear chromosomes. Furthermore, DNA molecules together with the histone proteins make these chromosomes. In the human genome, there is a total of 46 chromosomes in each cell. Nuclear membrane encloses all these chromosomes. Hence, they can’t come to the cytoplasm of the cell unless they become mRNA molecules. Also, in eukaryotes, mitochondria and chloroplasts contain some DNA molecules. However, they are not genomic DNA.

Figure 02: Eukaryotic Genome

Eukaryotic genome is less compact, and it contains repetitive sequences as well as many non-coding sequences such as introns and spacer DNA. With comparison to the prokaryotic genome, the eukaryotic genome is bigger and has billions of base pairs. Furthermore, it contains many genes with multiple copies.


Definition

Linear DNA refers to the DNA with two ends while circular DNA refers to the DNA with no ends.

Examples

The genetic material in the nucleus of eukaryotes is linear DNA while the genetic material of prokaryotes, as well as mtDNA and cpDNA, are circular DNA.

Occurrence

Linear DNA exclusively occurs inside the nucleus while circular DNA occurs in the cytoplasm or inside organelles.

Size of DNA

Generally, linear DNA is large in size while circular DNA is small in size.

Organization

Furthermore, linear DNA undergoes tight coiling and dense packing inside the nucleus while circular DNA does not undergo packing.

Ease of Transcription

Linear DNA is easy to transcribe while large circular DNA is difficult to transcribe due to the torsion strain that occurs during DNA unwinding.

Presence of Telomeres

While linear DNA contains telomeres, circular DNA does not contain telomeres.

End Replication Problem

Moreover, linear DNA has to face end replication problem while circular DNA does not undergo the end replication problem.

In Plasmids

In plasmids, some DNA is linear while supercoiled plasmid DNA is circular.

Conclusion

Linear DNA is a DNA structure with two ends. Generally, eukaryotic chromosomes are linear. Moreover, they consist of a large number of base pairs. On the other hand, circular DNA is the DNA with no ends. Prokaryotic chromosomes are circular while both mitochondrial and chloroplast DNAs are circular as well. However, circular DNA is small in size. Therefore, the main difference between linear and circular DNA is the structure of DNA.


There are many other bacteria with multiple chromosomes, but Vibrio species has members that have two circular chromosomes [16]. These include V. cholerae, V. parahaemolyticus, V. vulnificus and V. fluvialis [16].

Bacteria Chromosome Organization
Agrobacterium tumefaciens One linear and one circular
Bacillus subtilis Single and circular
Bacillus subtilis Single and linear
Borrelia burgdorferi Two circular
Brucella abortus Two circular
Brucella melitensis Two circular
Brucella ovis Two circular
Brucella suis biovar 1 Two circular
Brucella suis biovar 2 Two circular
Brucella suis biovar 4 Two circular
Escherichia coli Single and circular
Paracoccus denitrificans Three circular
Pseudomonas aeruginosa Single and circular
Rhodobacter sphaeroides Two circular
Streptomyces griseus Linear
Vibrio cholerae Two circular
Vibrio fluvialis Two circular
Vibrio parahaemolyticus Two circular
Vibrio vulnificus Two circular

The Circular Genome Viewer

CGView (http://wishart.biology.ualberta.ca/cgview/) is a Java program developed in 2005 as a tool for generating high-quality, navigable maps of circular genomes [ 8]. Originally intended for bacterial genomes, it has proven to be popular for organellar genomes as well. CGView supports a custom XML (Extensible Markup Language) input format for describing the contents and appearance of a map, which the program then converts into graphical format. Bitmap images (PNG or JPG) or vector-based output in SVG (Scalable Vector Graphics) format can be generated. The SVG format offers advantages for image editing and printing, but the file size can be problematic when complex maps are generated. The contents of a sample XML file and the resulting map generated by CGView are shown in Figure 1. Simpler input formats are supported, which allow the positions of genes to be described, but which offer less control over how the information is displayed. Also provided is a well-documented API (application program interface), which allows the CGView Java code to be used in other applications. For example, the BRIG program [ 9] uses CGView code and its associated API to generate maps displaying the results of sequence similarity comparisons between a bacterial genome of interest and other genomes.

CGView converts the contents of an XML file (A) into a graphical map (B). The XML describes general characteristics of the map (height, width and font styles and sizes, for example) as well as the features that are to be depicted. Each feature can have one or more ranges drawn on the map, the positions of which are described using ‘featureRange’ elements. Features are grouped by ‘featureSlot’ elements, which represent rings on the graphical map. A full description of the XML format and additional examples is available on the CGView Web site.

CGView converts the contents of an XML file (A) into a graphical map (B). The XML describes general characteristics of the map (height, width and font styles and sizes, for example) as well as the features that are to be depicted. Each feature can have one or more ranges drawn on the map, the positions of which are described using ‘featureRange’ elements. Features are grouped by ‘featureSlot’ elements, which represent rings on the graphical map. A full description of the XML format and additional examples is available on the CGView Web site.

The CGView application remains a popular genome visualization tool however, the process of creating a map is intimidating and laborious for many users. First, CGView is a command-line tool, meaning that users must enter specific commands and options in a command-line environment. Although command-line interfaces offer important advantages, many users understandably prefer an intuitive graphical user interface provided via a Web server or stand-alone application. Second, CGView itself does not identify sequence features or perform sequence analyses. Instead, the user must use other software to identify features of interest and then provide that information in a format that CGView supports. Although the CGView application download includes documentation on how to do this as well as a script for converting GenBank or EMBL files into an XML file suitable for CGView, many users are likely to lack the experience or inclination to build the required input files.

Map creation simplified: the CGView Server

CGView’s command-line interface and API allow it to be easily incorporated into more capable visualization tools or pipelines. One such tool is the CGView Server (http://stothard.afns.ualberta.ca/cgview_server/), a Web server released in 2008 that offers a convenient interface to CGView and has built-in analysis capabilities [ 10]. For example, up to three sequence data sets (protein or DNA) can be uploaded to the server, along with the primary genome sequence of interest (also termed the ‘reference sequence’). BLAST is used to compare the reference sequence with each of the uploaded data sets, and the results are included on the map. If the reference sequence is uploaded in GenBank or EMBL format, feature information is extracted and also displayed. A variety of other feature types can be identified or calculated and displayed (open reading frames, start and stop codons, GC content and GC skew) or uploaded in a simple tab-delimited format. Internally, the CGView Server generates the map using CGView and an XML input file that it builds according to the user-supplied options, information extracted from the reference sequence and the BLAST analysis results. Users can choose to have the server return a map displaying the entire sequence, or, thanks to the zooming capabilities of CGView, a portion of the genome at an expanded size ( Figure 2).

A map generated using the CGView Server, showing a full view of the genome (A) and an expanded view of a region of interest (B). The contents of the feature rings (starting with the outermost ring) are as follows: Ring 1 and Ring 2 depict features from the forward and reverse strands, respectively, read from the primary sequence file (E. coli NRG857c accession: CP001855) Ring 3 (E. coli LF82 accession: CU651637) Ring 4 (E. coli K12 substr. MG1655 accession: NC_000913) and Ring 5 (E. coli O157:H7 str. Sakai accession: NC_002695) shows BLAST comparison results (BLASTN) with the primary sequence Ring 6 shows putative NRG857c genomic islands indicative of horizontal gene transfer [ 11] Ring 7 shows GC content and Ring 8 shows GC skew. The BLAST comparison results are drawn at partial opacity—darker regions indicate the presence of multiple hits to the corresponding portion of the reference sequence.

A map generated using the CGView Server, showing a full view of the genome (A) and an expanded view of a region of interest (B). The contents of the feature rings (starting with the outermost ring) are as follows: Ring 1 and Ring 2 depict features from the forward and reverse strands, respectively, read from the primary sequence file (E. coli NRG857c accession: CP001855) Ring 3 (E. coli LF82 accession: CU651637) Ring 4 (E. coli K12 substr. MG1655 accession: NC_000913) and Ring 5 (E. coli O157:H7 str. Sakai accession: NC_002695) shows BLAST comparison results (BLASTN) with the primary sequence Ring 6 shows putative NRG857c genomic islands indicative of horizontal gene transfer [ 11] Ring 7 shows GC content and Ring 8 shows GC skew. The BLAST comparison results are drawn at partial opacity—darker regions indicate the presence of multiple hits to the corresponding portion of the reference sequence.

Some notable limitations of the CGView Server include support for only three comparison sequence data sets, the absence of vector-based output and reduced control over the appearance of maps compared with when the CGView application is used directly. The sequence comparison limitation exists to reduce the workload for the server. Vector-based output, SVG in the case of CGView, is not supported because the resulting files can be too large to send by e-mail even when compressed (the CGView Server sends the final map to the e-mail address supplied by the user). Instead, all maps are 3000 × 3000 pixels in PNG format. Finally, the reduced control over map appearance is the result of providing a simplified Web interface for controlling map appearance that does not provide the full flexibility of the XML format used by CGView.

Comparing thousands of genomes using the CGView Comparison Tool

Following the release of the CGView Server, we frequently received requests from users of the server who wanted maps to be modified. Typical requests included the addition of further comparison data sets, the changing of font sizes or feature colors, the creation of larger maps and the labeling of specific genes or features of interest. Over time, we developed a software pipeline that we used to handle these requests. This pipeline consists of a variety of scripts to build or modify complex maps potentially involving thousands of sequences. Eventually, we released this pipeline as the CGView Comparison Tool (CCT) (http://stothard.afns.ualberta.ca/downloads/CCT/) [ 12].

Although CCT is a command-line tool, its use is simplified through wrapper scripts that automate the map building process—creating a map can involve a few simple commands. For example, the map comparing Escherichiacoli O157:H7 str. Sakai with 100 additional E. coli genome sequences ( Figure 3) required four simple commands to generate (one to download the primary sequence one to start a map project one to download the comparison sequences and one to complete the map). The last command generates several maps automatically, differing in terms of size and level of detail, as well as in terms of how the BLAST comparisons are done (at the nucleotide level or at the level of translated coding sequences). The maps depicting translated coding sequence comparisons also, by default, display COG (Cluster of Orthologous Groups) classifications, generated through the use of a COG sequence database [ 13]. The contents and appearance of the maps can be changed from the default settings using a simple configuration file present in the map project directory, and by using command-line options. Additional custom feature types can be shown, as described in the tutorials section of the CCT Web site.

CCT map comparing E. coli O157:H7 str. Sakai (accession BA000007) with 100 additional E. coli genome sequences. A full-genome view (A) and zoomed view (B) are shown, with the latter centered on Shiga toxin I subunit A and B genes, labeled as ECs2974 and ECs2973, respectively. The contents of the feature rings (starting with the outermost ring) are as follows: Ring 1: COG functional categories for forward strand coding sequences Ring 2: forward strand sequence features Ring 3: reverse strand sequence features Ring 4: COG functional categories for reverse strand coding sequences. The next 100 rings show regions of sequence similarity detected by BLAST comparisons conducted between CDS (coding DNA sequence) translations from the reference genome and those from 100 E. coli comparison genomes.

CCT map comparing E. coli O157:H7 str. Sakai (accession BA000007) with 100 additional E. coli genome sequences. A full-genome view (A) and zoomed view (B) are shown, with the latter centered on Shiga toxin I subunit A and B genes, labeled as ECs2974 and ECs2973, respectively. The contents of the feature rings (starting with the outermost ring) are as follows: Ring 1: COG functional categories for forward strand coding sequences Ring 2: forward strand sequence features Ring 3: reverse strand sequence features Ring 4: COG functional categories for reverse strand coding sequences. The next 100 rings show regions of sequence similarity detected by BLAST comparisons conducted between CDS (coding DNA sequence) translations from the reference genome and those from 100 E. coli comparison genomes.

CCT is well suited to the analysis of large collections of complete or partial genomes generated through high-throughput sequencing, as thousands of comparison genomes can be displayed on a single map simply by placing each of their respective contig files (in GenBank, EMBL or FASTA format) in the ‘comparison_genomes’ directory of a CCT project. Metagenomes can be included as comparison genomes, in which case the map will serve to indicate which portions of the reference genome are similar to sequences in the metagenomes. In some scenarios, it may be of interest to generate many different maps, using a variety of reference genomes. CCT, by virtue of its command-line interface, can be run repeatedly from a script to generate separate maps for a group of reference sequences of interest. Alternatively, users can use the included ‘build_blast_atlas_all_vs_all.sh’ script. This script generates a separate map for each genome in a group of completed genomes, comparing the genome to all the others in the collection. The advantage of generating multiple maps in this manner is that the non-conserved regions of each genome can be found as regions lacking BLAST hits when the genome serves as the reference.


What is the advantage of circular genomes for bacteria and linear genomes for other organisms? - Biology

DNA supercoiling refers to the over- or under-winding of a DNA strand, and is an expression of the strain on that strand. Supercoiling is important in a number of biological processes, such as compacting DNA. Additionally, certain enzymes such as topoisomerases are able to change DNA topology to facilitate functions such as DNA replication or transcription. Mathematical expressions are used to describe supercoiling by comparing different coiled states to relaxed B-form DNA.

Supercoiled Structure of Circular DNA: This is a supercoiled structure of circular DNA molecules with low writhe. Note that the helical nature of the DNA duplex is omitted for clarity.

As a general rule, the DNA of most organisms is negatively supercoiled.

In a “relaxed” double-helical segment of B-DNA, the two strands twist around the helical axis once every 10.4 to 10.5 base pairs of sequence. Adding or subtracting twists, as some enzymes can do, imposes strain. If a DNA segment under twist strain were closed into a circle by joining its two ends and then allowed to move freely, the circular DNA would contort into a new shape, such as a simple figure-eight. Such a contortion is a supercoil.

The simple figure eight is the simplest supercoil, and is the shape a circular DNA assumes to accommodate one too many or one too few helical twists. The two lobes of the figure eight will appear rotated either clockwise or counterclockwise with respect to one another, depending on whether the helix is over or underwound. For each additional helical twist being accommodated, the lobes will show one more rotation about their axis.

The noun form “supercoil” is rarely used in the context of DNA topology. Instead, global contortions of a circular DNA, such as the rotation of the figure-eight lobes above, are referred to as writhe. The above example illustrates that twist and writhe are interconvertible. “Supercoiling” is an abstract mathematical property representing the sum of twist and writhe. The twist is the number of helical turns in the DNA and the writhe is the number of times the double helix crosses over on itself (these are the supercoils).

Extra helical twists are positive and lead to positive supercoiling, while subtractive twisting causes negative supercoiling. Many topoisomerase enzymes sense supercoiling and either generate or dissipate it as they change DNA topology. DNA of most organisms is negatively supercoiled.

In part because chromosomes may be very large, segments in the middle may act as if their ends are anchored. As a result, they may be unable to distribute excess twist to the rest of the chromosome or to absorb twist to recover from underwinding—the segments may become supercoiled, in other words. In response to supercoiling, they will assume an amount of writhe, just as if their ends were joined.

Supercoiled DNA forms two structures a plectoneme or a toroid, or a combination of both. A negatively supercoiled DNA molecule will produce either a one-start left-handed helix, the toroid, or a two-start right-handed helix with terminal loops, the plectoneme. Plectonemes are typically more common in nature, and this is the shape most bacterial plasmids will take. For larger molecules, it is common for hybrid structures to form – a loop on a toroid can extend into a plectoneme. If all the loops on a toroid extend, it becomes a branch point in the plectonemic structure.

The Importance of DNA supercoiling

DNA supercoiling is important for DNA packaging within all cells. Because the length of DNA can be thousands of times that of a cell, packaging this genetic material into the cell or nucleus (in eukaryotes ) is a difficult feat. Supercoiling of DNA reduces the space and allows for much more DNA to be packaged. In prokaryotes, plectonemic supercoils are predominant, because of the circular chromosome and relatively small amount of genetic material. In eukaryotes, DNA supercoiling exists on many levels of both plectonemic and solenoidal supercoils, with the solenoidal supercoiling proving the most effective in compacting the DNA. Solenoidal supercoiling is achieved with histones to form a 10 nm fiber. This fiber is further coiled into a 30 nm fiber, and further coiled upon itself numerous times more.

DNA packaging is greatly increased during nuclear division events such as mitosis or meiosis, where DNA must be compacted and segregated to daughter cells. Condensins and cohesins are structural maintenance of chromosome (SMC) proteins that aid in the condensation of sister chromatids and the linkage of the centromere in sister chromatids. These SMC proteins induce positive supercoils.

Supercoiling is also required for DNA and RNA synthesis. Because DNA must be unwound for DNA and RNA polymerase action, supercoils will result. The region ahead of the polymerase complex will be unwound this stress is compensated with positive supercoils ahead of the complex. Behind the complex, DNA is rewound and there will be compensatory negative supercoils. It is important to note that topoisomerases such as DNA gyrase (Type II Topoisomerase) play a role in relieving some of the stress during DNA and RNA synthesis.


Studying microbial genomes

In the years preceding the development of full genome sequencing techniques we were restricted to studying microbial genomes in the lab using techniques such as PFGE.

Pulsed Field Gel Electrophoresis (PFGE)-

The principle underlying PFGE is very similar to normal gel electrophoresis. PFGE however allows us to resolve far large, 'genome scale' pieces of DNA (greater than 20 kilobases in size). PFGE is still an important technique used to estimate the size of microbial genomes and in epidemiology studies.

You can learn more about similar, basic genomics techniques on the 'Recombinant DNA and genetic techniques' page.

DNA Sequencing

DNA sequencing has been an enormous advancement in the field of microbial genomics and indeed genetics as a whole, allowing us to amass vast amounts of genetic data form our organisms of choice. The first method for sequencing DNA was developed by Frederick Sanger and his group in 1977. Their method, termed Sanger Sequencing was a platform for innovation in the field of DNA sequencing, and we now have methods for sequencing entire bacterial genomes with relative ease. Whole genome sequencing produces immense amounts of data from which we can derive a catalogue of important information. From the need to analyse this data, the field of bioinformatics has flourished and become an integral part of genetics research.

DNA sequencing identifies and records every nucleotide, in order, that make up a piece of DNA.

Bioinformatics-

Bioinformatics is the use of computing technologies to manage and analyse biological data. The huge amounts of genomic data produced through DNA sequencing can often be very confusing and difficult to analyse. Therefore bioinformatics is often required to derive useful information from this data. You can learn more about some of the bioinformatics tools we use to analyse genetic information by clicking 'access topic related resources' at the top of the page. Using bioinformatics techniques, we can map the positions of genes, elicit their functions and infer evolutionary relationships. Together with experimental work we can use this information to learn more about how microbes ultimately survive and cause disease.

Take a look at some of our genomics work over at the 'our research' page!


1. Introduction

Marseilleviridae is an expanding family of large double-stranded DNA viruses infecting free-living amoeba of the Acanthamoeba genus. Their icosahedral capsids of 250 nm diameters enclose a 340 to 390 kb genome predicted to encode an average of 500 protein-coding genes [1,2,3,4,5,6,7,8,9,10,11,12]. Among these genes, some code for unexpected functions for a virus, the most surprising being homologues to cellular histones [1,2]. Viruses from this family belong to the NCLDVs (for nucleocytoplasmic large DNA viruses), i.e., the Nucleocytoviricota phylum, according to the latest International Committee on Taxonomy of Viruses (ICTV) classification [13,14]. Marseilleviruses’ replication cycles start with their phagocytosis by the Acanthamoeba host. Once in the cytoplasm, they form the so-called “viral factory” in the vicinity of the nucleus where virion assembly and DNA packaging occur simultaneously [1]. Mature particles are then released through cell lysis roughly 8 h post-infection (pi) [1]. However, the duration of the replication cycle is variable among Marseilleviridae with strains for which virions are released at 13� h up to 24 h pi. Since marseilleviruses encode a complete transcription apparatus and the host nucleus appears to remain intact during the entire cycle, it was initially assumed that marseilleviruses were bona fide cytoplasmic viruses, without a nuclear phase. However, it was subsequently shown that virally encoded RNA polymerase subunit proteins are not packaged within the virions, thus precluding the transcription of viral genes to start [10]. As a workaround, nuclear proteins are actively, albeit transiently, recruited by the viral factory to initiate the transcription of viral genes, thus placing marseilleviruses between viruses strictly replicating within the cytoplasm and those involving an intranuclear phase [10].

Marseillevirus T19 was the first Marseilleviridae to be isolated by co-culturing with Acanthamoeba castellanii [1]. Since then, several strains were isolated using the same approach, mainly from aquatic samples of different continents (Asia [9,12,15], Africa [4,5], South America [6,11], Europe [1,2,3,8] and Australia [7,10]). In addition, marseillevirus-like genomic sequences were identified in environmental metagenomics assembled data [16]. Among the isolated strains, thirteen were fully sequenced (Table S1), and their phylogeny shows that they belong to five distinct clades [10,15] (Figure S1). From the analysis of the genes encoded in these genomes, it was estimated that roughly 25% of them are of potential cellular origin, making horizontal gene transfers (HGT) a contributing factor shaping the Marseilleviridae genomes [1]. Surprisingly, only 23% of these exchanges involve the Amebozoa host, as opposed to 45% for bacteria and bacteriophages [1]. Even more remarkably, this large fraction of bacteria-related genes are subjected to strong purifying selection, and thus probably contribute to viral fitness [7]. One striking example is the Marseilleviridae-encoded restriction–modification (RM) system that involves restriction endonucleases and DNA methyltransferases of bacterial origin [17]. It is suspected that it serves as a weapon against amoeba intracellular parasites, thus giving to the virus a selective advantage.

Besides evolutionary questions, marseilleviruses’ physiology has been examined through several genome-wide surveys using various omics data. First, proteomic data of the viral particles of three Marseilleviridae members were produced, namely marseillevirus [1], noumeavirus and melbournevirus [10]. This not only revealed the proteins that build the structure of the Marseilleviridae virions, but also those packaged within it that could be essential for initiating the viral replication. In addition, the marseillevirus’ transcriptional activity during an infection cycle in A. castellanii was recently surveyed by RNA sequencing (RNA-seq), showing that the host translation apparatus is downregulated during the infection [18]. This now provides us with a sufficient body of data to conduct an in-depth comparative genomics study of the Marseilleviridae family.

In this study, we first experimentally confirm the circular structure of the marseilleviruses genomes. Using available genomic, proteomic and transcriptomic data, we then reveal a strong bias in the distribution of the marseilleviruses’ genes. We examine the genomic rearrangements as well as the genomic distribution of several gene categories along the genomes. More specifically, we unveil the uneven distribution of the core genes (i.e., genes conserved in all Marseilleviridae), the virion-associated genes and the paralogous genes (i.e., genes that were duplicated during the Marseilleviridae evolution). This work helps us to better understand the global organization of the Marseilleviridae genomes, as well as the evolution and physiology of this viral family.


2.4. The Repetitive DNA Content of Genomes

Repetitive DNA is the one aspect of genome structure that we have not examined in detail. Figure 2.2 (page 34) showed us that repetitive DNA is found in all organisms and that in some, including humans, it makes up a substantial fraction of the entire genome. There are various types of repetitive DNA, and several classification systems have been devised. The scheme that we will use begins by dividing the repeats into those that are clustered into tandem arrays and those that are dispersed around the genome.

2.4.1. Tandemly repeated DNA

Tandemly repeated DNA is a common feature of eukaryotic genomes but is found much less frequently in prokaryotes. This type of repeat is also called satellite DNA because DNA fragments containing tandemly repeated sequences form ‘satellite’ bands when genomic DNA is fractionated by density gradient centrifugation (see Technical Note 2.2). For example, when broken into fragments 50� kb in length, human DNA forms a main band (buoyant density 1.701 g cm -3 ) and three satellite bands (1.687, 1.693 and 1.697 g cm -3 ). The main band contains DNA fragments made up mostly of single-copy sequences with GC compositions close to 40.3%, the average value for the human genome. The satellite bands contain fragments of repetitive DNA, and hence have GC contents and buoyant densities that are atypical of the genome as a whole (Figure 2.24).

Satellite DNA is found at centromeres and elsewhere in eukaryotic chromosomes

The satellite bands in density gradients of eukaryotic DNA are made up of fragments composed of long series of tandem repeats, possibly hundreds of kb in length. A single genome can contain several different types of satellite DNA, each with a different repeat unit, these units being anything from < 5 to > 200 bp. The three satellite bands in human DNA include at least four different repeat types.

We have already encountered one type of human satellite DNA, the alphoid DNA repeats found in the centromere regions of chromosomes (see page 38). Although some satellite DNA is scattered around the genome, most is located in the centromeres, where it may play a structural role, possibly as binding sites for one or more of the special centromeric proteins (see page 39). Alternatively, the repetitive DNA content of the centromere might be a reflection of the fact that this is the last region of the chromosome to be replicated. In order to delay its replication until the very end of the cell cycle, the centromere DNA must lack sequences that can act as origins of replication. The repetitive nature of centromeric DNA may be a means of ensuring that such origins are absent (Csink and Henikoff, 1998).

Minisatellites and microsatellites

Although not appearing in satellite bands on density gradients, two other types of tandemly repeated DNA are also classed as ‘satellite’ DNA. These are minisatellites and microsatellites. Minisatellites form clusters up to 20 kb in length, with repeat units up to 25 bp microsatellite clusters are shorter, usually < 150 bp, and the repeat unit is usually 13 bp or less.

Minisatellite DNA is a second type of repetitive DNA that we are already familiar with because of its association with structural features of chromosomes. Telomeric DNA, which in humans comprises hundreds of copies of the motif 5′-TTAGGG-3′ (see Figure 2.10), is an example of a minisatellite. We know a certain amount about how telomeric DNA is formed, and we know that it has an important function in DNA replication (Section 13.2.4). In addition to telomeric minisatellites, some eukaryotic genomes contain various other clusters of minisatellite DNA, many, although not all, near the ends of chromosomes. The functions of these other minisatellite sequences have not been identified.

The function of microsatellites is equally mysterious. The typical microsatellite consists of a 1-, 2-, 3- or 4-bp unit repeated 10� times, as illustrated by the microsatellites in the human β T-cell receptor locus (Section 1.2.1). Although each microsatellite is relatively short, there are many of them in the genome (see Table 1.3). In humans, for example, microsatellites with a CA repeat, such as:

Although their function, if any, is unknown, microsatellites have proved very useful to geneticists. Many microsatellites are variable, meaning that the number of repeat units in the array is different in different members of a species. This is because ‘slippage’ sometimes occurs when a microsatellite is copied during DNA replication, leading to insertion or, less frequently, deletion of one or more of the repeat units (see Figure 14.5). No two humans alive today have exactly the same combination of microsatellite length variants: if enough microsatellites are examined then a unique genetic profile can be established for every person. The only exceptions are genetically identical twins. Genetic profiling is well known as a tool in forensic science (Figure 2.25), but identification of criminals is a fairly trivial application of microsatellite variability. More sophisticated methodology makes use of the fact that a person's genetic profile is inherited partly from the mother and partly from the father. This means that microsatellites can be used to establish kinship relationships and population affinities, not only for humans but also for other animals, and for plants.

Figure 2.25

The use of microsatellite analysis in genetic profiling. In this example, microsatellites located on the short arm of chromosome 6 have been amplified by the polymerase chain reaction (PCR Section 4.3). The PCR products are labeled with a blue or green (more. )

2.4.2. Interspersed genome-wide repeats

Tandemly repeated DNA sequences are thought to have arisen by expansion of a progenitor sequence, either by replication slippage, as described for microsatellites, or by DNA recombination processes (Section 14.3). Both of these events are likely to result in a series of linked repeats, rather than individual repeat units scattered around the genome. Interspersed repeats must therefore have arisen by a different mechanism, one that can result in a copy of a repeat unit appearing in the genome at a position distant from the location of the original sequence. The most frequent way in which this occurs is by transposition, and most interspersed repeats have inherent transpositional activity.

Transposition via an RNA intermediate

The precise mechanics of transposition need not worry us until we deal with recombination and related rearrangements to the genome in Section 14.3. All that we need to know at this point is that there are two alternative modes of transposition, one that involves an RNA intermediate and one that does not. The version that involves an RNA intermediate is called retrotransposition. The basic mechanism involves three steps (Figure 2.26): 1.

An RNA copy of the transposon is synthesized by the normal process of transcription.

The RNA transcript is copied into DNA, which initially exists as an independent molecule outside of the genome. This conversion of RNA to DNA, the reverse of the normal transcription process, requires a special enzyme called reverse transcriptase. Often the reverse transcriptase is coded by a gene within the transposon and is translated from the RNA copy synthesized in step 1.

The DNA copy of the transposon integrates into the genome, possibly back into the same chromosome occupied by the original unit, or possibly into a different chromosome. The end result is that there are now two copies of the transposon, at different points in the genome.

Figure 2.26

Retrotransposition. Compare with Figure 1.19 (page 22), and note that the events are essentially the same as those that result in a processed pseudogene.

Figure 2.27

Retroelements. A comparison of the structures of four types of retroelement. Retroviruses and retrotransposons are LTR elements that possess long terminal repeats at each end. The gag gene codes for a series of proteins located in the virus core pol (more. )

DNA transposons

Not all transposons require an RNA intermediate. Many are able to transpose in a more direct DNA to DNA manner. With these elements we are aware of two distinct transposition mechanisms (Figure 2.28), one involving direct interaction between the donor transposon and the target site, resulting in copying of the donor element (replicative transposition), and the second involving excision of the element and re-integration at a new site (conservative transposition). Both mechanisms require enzymes which are usually coded by genes within the transposon. The molecular events that occur during these two types of transposition are described in Section 14.3.3.

Figure 2.28

Two mechanisms of transposition used by DNA transposons. For more details see Section 14.3.3.

In eukaryotes, DNA transposons are less common than retrotransposons (see Table 1.2), but they have a special place in genetics because a family of plant DNA transposons - the Ac/Ds elements of maize - were the first transposable elements to be discovered, by Barbara McClintock in the 1950s. Her conclusions - that some genes are mobile and can move from one position to another in a chromosome - were based on exquisite genetic experiments, the molecular basis of transposition not being understood until the late 1970s.

Figure 2.29

DNA transposons of prokaryotes. Four types are shown. Insertion sequences, Tn3-type transposons and transposable phages are flanked by short (< 50 bp) inverted terminal repeat (ITR) sequences. The resolvase gene of the Tn3-type transposon codes (more. )


Watch the video: Πλύσιμο Χεριών- Εικόνες με τραγούδι (August 2022).