We are searching data for your request:
Upon completion, a link will appear to access the found materials.
The retrotransposons and certain DNA-transposons, are "jumping" sequences which may be incorporated elsewhere in the genomic DNA of an organism, through varying mechanisms. This insertion is almost random in case of retrotransposons and most DNA-transposons, but some transposing enzymes show specificity in selection of insertion sites. In either case, the "copy-paste" mechanism of transposition is essentially a random process(considering the frequency and place of transposition) and therefore, the extent of transposition should vary among all individuals, and all cells of a single individual. My question is,
Is the extent of transposition random and varied among the different somatic cells of an individual or is there a regulatory mechanism which ensures that there is not much difference in the net content of DNA (caused by unequal transposition) by equalizing the extent of transposition in different individuals or different cells of the same individual? If not, does this unequal c-value in different cell cause any troubles?
Transposition may occur at any given time in life, and in any cell. There are several cases known where transposition interests somatic cells (give a look here for reference). However, the frequency of transposition events is low, and the difference in amount of DNA between different cells in an individual is low. When you reason at the level of different individuals, however, the proportion of genome affected by transposable elements may be relatively high (give a look here for a reference)
'Junk' DNA Has Important Role, Researchers Find
Scientists have called it "junk DNA." They have long been perplexed by these extensive strands of genetic material that dominate the genome but seem to lack specific functions. Why would nature force the genome to carry so much excess baggage?
Now researchers from Princeton University and Indiana University who have been studying the genome of a pond organism have found that junk DNA may not be so junky after all. They have discovered that DNA sequences from regions of what had been viewed as the "dispensable genome" are actually performing functions that are central for the organism. They have concluded that the genes spur an almost acrobatic rearrangement of the entire genome that is necessary for the organism to grow.
It all happens very quickly. Genes called transposons in the single-celled pond-dwelling organism Oxytricha produce cell proteins known as transposases. During development, the transposons appear to first influence hundreds of thousands of DNA pieces to regroup. Then, when no longer needed, the organism cleverly erases the transposases from its genetic material, paring its genome to a slim 5 percent of its original load.
"The transposons actually perform a central role for the cell," said Laura Landweber, a professor of ecology and evolutionary biology at Princeton and an author of the study. "They stitch together the genes in working form." The work appeared in the May 15 edition of Science.
In order to prove that the transposons have this reassembly function, the scientists disabled several thousand of these genes in some Oxytricha. The organisms with the altered DNA, they found, failed to develop properly.
Other authors from Princeton's Department of Ecology and Evolutionary Biology include: postdoctoral fellows Mariusz Nowacki and Brian Higgins 2006 alumna Genevieve Maquilan and graduate student Estienne Swart. Former Princeton postdoctoral fellow Thomas Doak, now of Indiana University, also contributed to the study.
Landweber and other members of her team are researching the origin and evolution of genes and genome rearrangement, with particular focus on Oxytricha because it undergoes massive genome reorganization during development.
In her lab, Landweber studies the evolutionary origin of novel genetic systems such as Oxytricha's. By combining molecular, evolutionary, theoretical and synthetic biology, Landweber and colleagues last year discovered an RNA (ribonucleic acid)-guided mechanism underlying its complex genome rearrangements.
"Last year, we found the instruction book for how to put this genome back together again -- the instruction set comes in the form of RNA that is passed briefly from parent to offspring and these maternal RNAs provide templates for the rearrangement process," Landweber said. "Now we've been studying the actual machinery involved in the process of cutting and splicing tremendous amounts of DNA. Transposons are very good at that."
The term "junk DNA" was originally coined to refer to a region of DNA that contained no genetic information. Scientists are beginning to find, however, that much of this so-called junk plays important roles in the regulation of gene activity. No one yet knows how extensive that role may be.
Instead, scientists sometimes refer to these regions as "selfish DNA" if they make no specific contribution to the reproductive success of the host organism. Like a computer virus that copies itself ad nauseum, selfish DNA replicates and passes from parent to offspring for the sole benefit of the DNA itself. The present study suggests that some selfish DNA transposons can instead confer an important role to their hosts, thereby establishing themselves as long-term residents of the genome.
Materials provided by Princeton University. Original written by Kitta MacPherson. Note: Content may be edited for style and length.
Scientists uncover transfer of genetic material between blood-sucking insect and mammals
Researchers at The University of Texas at Arlington have found the first solid evidence of horizontal DNA transfer, the movement of genetic material among non-mating species, between parasitic invertebrates and some of their vertebrate hosts.
The findings are published in the April 28 issue of the journal Nature, one of the world's foremost scientific journals.
Genome biologist Cédric Feschotte and postdoctoral researchers Clément Gilbert and Sarah Schaack found evidence of horizontal transfer of transposon from a South American blood-sucking bug and a pond snail to their hosts. A transposon is a segment of DNA that can replicate itself and move around to different positions within the genome. Transposons can cause mutations, change the amount of DNA in the cell and dramatically influence the structure and function of the genomes where they reside.
"Since these bugs frequently feed on humans, it is conceivable that bugs and humans may have exchanged DNA through the mechanism we uncovered. Detecting recent transfers to humans would require examining people that have been exposed to the bugs for thousands of years, such as native South American populations," Feschotte said.
Data on the insect and the snail provide strong evidence for the previously hypothesized role of host-parasite interactions in facilitating horizontal transfer of genetic material. Additionally, the large amount of DNA generated by the horizontally transferred transposons supports the idea that the exchange of genetic material between hosts and parasites influences their genomic evolution.
"It's not a smoking gun, but it is as close to it as you can get," Feschotte said
The infected blood-sucking triatomine, causes Chagas disease by passing trypanosomes (parasitic protozoa) to its host. Researchers found the bug shared transposon DNA with some hosts, namely the opossum and the squirrel monkey. The transposons found in the insect are 98 percent identical to those of its mammal hosts.
The researchers also identified members of what Feschotte calls space invader transposons in the genome of Lymnaea stagnalis, a pond snail that acts as an intermediate host for trematode worms, a parasite to a wide range of mammals.
The long-held theory is that mammals obtain genes vertically, or handed down from parents to offspring. Bacteria receive their genes vertically and also horizontally, passed from one unrelated individual to another or even between different species. Such lateral gene transfers are frequent in bacteria and essential for rapid adaptation to environmental and physiological challenges, such as exposure to antibiotics.
Until recently, it was not known horizontal transfer could propel the evolution of complex multicellular organisms like mammals. In 2008, Feschotte and his colleagues published the first unequivocal evidence of horizontal DNA transfer.
Millions of years ago, tranposons jumped sideways into several mammalian species. The transposon integrated itself into the chromosomes of germ cells, ensuring it would be passed onto future generations. Thus, parts of those mammals' DNA did not descend from their common ancestors, but were acquired laterally from another species.
The actual means by which transposons can spread across widely diverse species has remained a mystery.
"When you are trying to understand something that occurred over thousands or millions of years ago, it is not possible to set up a laboratory experiment to replicate what happened in nature," Feschotte said.
Instead, the researchers made their discovery using computer programs designed to compare the distribution of mobile genetic elements among the 102 animals for which entire genome sequences are currently available. Paul J. Brindley of George Washington University Medical Center in Washington, D.C., contributed tissues and DNA used to confirm experimentally the computational predictions of Feschotte's team.
When the human genome was sequenced a decade ago, researchers found that nearly half of the human genome is derived from transposons, so this new knowledge has important ramifications for understanding the genetics of humans and other mammals.
Feschotte's research is representative of the cutting edge research that is propelling UT Arlington on its mission of becoming a nationally recognized research institution.
Mosses are one of the oldest groups of land plants, forming a sister clade with vascular plants (Leebens-Mack etਊl., 2019). Since the demonstration, in 1997, that gene targeting via homologous recombination was possible in Physcomitrium (Physcomitrella) patens (Schaefer and Zr, 2001) this moss has become a leading plant model for answering essential questions in life sciences and in particular for understanding the evolution of biological processes of land plants. The draft of the P. patens genome was published in 2008 (Rensing etਊl., 2008), and a chromosome-scale assembly of the P. patens genome has been published (Lang etਊl., 2018), highlighting the similarities and differences with other plant genomes. Transposable Elements (TEs) account for the 57% of the 462,3 Mb of the assembled P. patens genome. This TE coverage is not very different from that of other plant genomes of similar size (Tenaillon etਊl., 2010). On the contrary, the distribution of TEs in P. patens is unusual as compared to other plants. TE-rich regions alternate with gene-rich regions all along the P. patens chromosomes (Lang etਊl., 2018) whereas in most plant genomes TEs accumulate in pericentromeric heterochromatic region on each chromosome. Interestingly, in spite of the general patchy TE distribution, a family of retrotransposons of the copia superfamily, RLC5 (comprised of full length, from now on RLC5, and truncated, tRLC5, elements), clusters at a single location in each chromosome that could correspond to the centromere (Lang etਊl., 2018). The TE-rich regions distributed all along the chromosomes are mainly composed of a single family of LTR-retrotransposons of the gypsy superfamily named RLG1 (Lang etਊl., 2018). RLG1 integrase contains a chromodomain, a type of protein domain that has been previously found To direct retrotransposon integration into heterochromatin (Gao etਊl., 2008), suggesting that RLG1 could target heterochromatic TE islands for integration. Although most TE copies are located in heterochromatic TE islands, gene-rich regions also contain some TE copies, with some of them that inserted recently and are polymorphic between the Gransden and Villersexel accessions (Lang etਊl., 2018). Moreover, the RLG1 retrotransposon is transcribed in P. patens protonema cells, suggesting that it can transpose during P. patens development (Vives etਊl., 2016 Lang etਊl., 2018). Although these data suggest that TE activity may have shaped the genome of P. patens and may continue to generate variability that potentially impact P. patens evolution, the global analysis of the capacity of P. patens TEs to be expressed and transpose is still lacking. Here we present an unbiased analysis of TE expression in P. patens based on RNA-Seq analyses and confirmed by qRT-PCR, that has allowed uncovering the developmentally or stress-related expression of different TE families, including class I (retrotransposons) and class II (DNA transposons) TEs. The data presented here reinforce the idea that TEs have shaped the genome of P. patens and show that they continue to drive its evolution.
The chromosome-scale assembly of the wheat genome provided an unprecedented genome-wide view of the organization and impact of TEs in such a complex genome. Since they diverged, the A, B, and D subgenomes have experienced a near-complete TE turnover, although polyploidization did not massively reactivate TEs. This turnover contrasted drastically with the high level of gene synteny. Apart from genes, there was no conservation of the TE space between homeologous loci. But surprisingly, TE families that have shaped the A, B, and D subgenomes are the same, and unexpectedly, their proportions and intrinsic properties (gene-prone or not) are quite similar despite their independent evolution in the diploid lineages. Thus, TE families are somehow at equilibrium in the genome since the A-B-D common ancestor. These novel insights contradict the previous model of evolution with amplification bursts followed by rapid silencing. Our results suggest a role of TEs at the structural level. TEs are not just “junk DNA” our findings open new perspectives to elucidate their role in high-order chromatin arrangement, chromosome territories, and gene regulation.
Transposons Identified as Likely Cause of Undiagnosed Diseases
Jan 13, 2020
W hen Wellcome Sanger Institute geneticist Eugene Gardner set out to look for a specific type of genetic mutation in a massive database of human DNA, he figured it’d be a long shot. Transposons—also known as jumping genes because they can move around the genome—create a new mutation in one of every 15 to 40 human births, but that’s across the entire 3 billion base pairs of nuclear DNA that each cell carries. The sequencing data that Gardner was working with covered less than two percent of that, with only the protein-coding regions, or exons, included. Doing a quick calculation, he determined that, in the best-case scenario, he could expect to find up to 10 transposon-generated variants linked to a developmental disease. And “we really might get zero,” he says. “This whole thing might be for naught.”
But Gardner had recently developed the perfect tool to find the sort of de novo mobile element insertions that come about as a result of transposon movements and are often overlooked in genetic screens and analyses. As a graduate student in Scott Devine’s lab at the University of Maryland, Baltimore’s Institute for Genome Sciences, he had spent many hours making the software for the mobile element locator tool he dubbed MELT. The program was easy to use, so when Gardner moved across the Atlantic for a postdoc in Matthew Hurles’s lab at Sanger near Cambridge and gained access to a database of exomes from 13,000 patients with developmental disorders, he figured running the tool was worth a try.
There is tremendous value for these families that get a diagnosis.
Of those 13,000, Gardner focused on 9,738 people in the Deciphering Developmental Disorders (DDD) study whose parents’ exomes had also been sequenced, making it easier to single out variants present in the child but not in mom and dad. And as it turned out, he did get some hits. MELT picked up 40 potentially transposon-generated variants, which Gardner sat down at his computer to review using the raw sequencing data. Nine appeared to be true de novo mobile element insertions. “I remember being in my desk doing the visualization of all the putative de novo variants after I got the first results off the pipeline,” he recalls. “I remember being excited: I think I might have found a diagnostic de novo!”
Discussing the literature on the genes affected by such insertions with clinicians and other colleagues, Gardner narrowed the list down to four insertions found in genes that may be causing or contributing to four different patients’ disorders. He sent these results off to the physicians who had referred each of the patients to the database, and all the doctors confirmed that the results made sense to them given what had been published on those genes and what they knew about other cases involving patients with mutations in the same sequences. In one case, the physician had already linked the patient’s disorder to the gene Gardner had identified in the other three cases, the patients were still undiagnosed.
“There is tremendous value for these families that get a diagnosis,” says human geneticist Dan Koboldt, who has collaborated with Hurles in the past and has used MELT in his studies of rare disease at the Steve and Cindy Rasmussen Institute for Genomic Medicine at Nationwide Children’s Hospital in Columbus, Ohio, but who was not involved in Gardner’s recent study. A genetic answer not only can help physicians connect patients to appropriate medical and counseling resources it puts an end to the diagnostic odyssey that families affected by rare disease often endure.
What’s more, the finding of four potentially causative hits out of the nearly 10,000 cases provides first estimate of how commonly such mobile element insertions underlie developmental disorders. “What’s interesting about this study is that it’s taking a very broad approach,” says Ian Adams, a developmental biologist at the University of Edinburgh’s MRC Human Genetics Unit who was not involved in the research. Rather than look for transposon activity in a specific disorder, “it’s casting a much broader net in trying to find what type of diseases this class of mutations could be contributing to.”
This approach is important, agrees Adams’s MRC Human Genetics Unit colleague Jose Garcia-Perez, a transposable elements expert who was also not involved in the new research. In the last few years, two studies have used a tool developed around the same time as MELT to search for de novo mobile elements in people with autism spectrum disorder, but neither identified any that were likely to be responsible for the patients’ symptoms. “[Gardner’s] study shows that, no matter what’s [been found] recently, it’s something that should be explored in further detail in the future,” says Garcia-Perez. “[The study] actually shows a real connection between . . . transposition with that particular [type of] disorder.” Koboldt adds: “The reason this is an important study is that it establishes [that these] variants do occur and [that] they can be pathogenic.”
Gardner says he hopes that his methods can be used to explore other diseases, from both a research and a clinical perspective. Adams says MELT does appear to be “widely applicable to other datasets.” Such a tool could be a boon to research on transposons, given that their movements are often missed by normal screening tools, Adams adds. “I think [MELT is] something that could be readily built into existing pipelines.”
Non-coding information in LTR retrotransposons
Variation in retrotransposon genomic organization is not limited to the presence or absence of coding information. Some retrotransposons contain a large amount of conserved non-coding sequence. The barley LARD element with 3.5 kb of non-coding DNA (mentioned above) is one example another is a group of plant metaviruses that carry several kilobases of non-coding DNA between pol and the 3' LTR. Among these are the maize Cinful  and Grande1  elements, RIRE2 from rice  and Tat1 from Arabidopsis . For Grande1 and RIRE2, antisense ORFs have been described, but they do not account for the entire segment of non-coding DNA [21, 22]. In addition, many retrotransposons, including the Grande1 and Cinful elements, have a series of short tandem repeats very close to the 3' end of the pol gene, or at a putative pol-env junction. This may suggest a potential function for the tandem repeats: they may facilitate recombination and acquisition of new coding information through gene transduction . In support of this hypothesis, repeated non-coding information seems to be found between the env-like ORF and the 3' LTR in both the SIRE1  and Athila retrotransposons . In the retrotransposons with env-like ORFs, the repeats show similarity to polypurine tracts, suggesting that they might instead have a role in reverse transcription.
The sequenced eukaryotic genomes have provided a new appreciation of the diversity among LTR retrotransposons. As sequence data accumulate, additional novel elements are likely to be revealed. The challenge in the future will be to understand how diversity in retrotransposon genome organization and coding sequences reflects differences in retrotransposition mechanisms and strategies employed by these elements to colonize their host genomes.
Understanding how to control 'jumping' genes
ATXR5 co-localized with Serrate in the nucleus. Credit: Texas A&M University
A team of Texas A&M University and Texas AgriLife Research scientists have made a new discovery of how a single protein, Serrate, plays dual roles in controlling jumping genes.
The work will greatly help scientists manipulate gene expression for breeding better crops as well as design a more efficient therapeutic strategy for curing human disease, according to the scientists.
Drs. Xiuren Zhang and Zeyang Ma, along with a team of scientists in the department of biochemistry and biophysics and Institute for Plant Genomics and Biotechnology at Texas A&M in College Station, have their findings published online in the journal Developmental Cell.
Jumping genes, or transposons were discovered by maize geneticist Barbara McClintock in the 1940s. These genes change position or "jump" along the genome and make up a large portion of the genome DNA—more than 40 percent of the human genome and up to 90 percent of genomes for certain plants.
"For years, they had been thought of as useless or 'junk' DNA," Zhang said. "However, it has been recently known that transposons also play very important roles in gene regulation and evolution regardless of the potential deleterious effect."
The team has been working for years on the functions of Serrate in the model plant Arabidopsis, a small weed that is a popular tool for understanding the molecular biology of many plant traits. Unexpectedly, they found that Serrate, best known in processing of microRNA and message RNA, is also involved in the regulation of jumping gene expression.
Their work involved isolating Serrate protein complexes. They found in the complexes there are Arabidopsis Trithorax-related proteins known as ATXR5/6. These are the enzymes that add the Histone 3 K27 monomethylation, or H3K27me1, mark to the chromatin, which serves as a home for DNA, including transposon genes.
"H3K27me1 acts to repress transposon expression," Ma said. "That means the more the H3K27me1 mark, the less the expression of transposon genes. Because of this, transposon expression level is increased in the plants that lack ATXR5/6 genes."
The scientists then used a plant that harbors a mutation of the Serrate gene and surveyed the genome-wide level of the H3K27me1 mark. They found that this mark on the chromatin was decreased in the Serrate mutant, which means that Serrate promotes ATXR5/6 to add the H3K27me1 mark and inhibits the production of the transposon transcripts.
"One would expect that the transposon transcripts would be highly accumulated in the plants that lack Serrate gene," Ma said. "When team members measured the levels of transposon transcripts, to their big surprise they did not see the increment. Rather, transposon expression was shut down again when the plants concurrently miss ATXR5/6 and Serrate genes.
To solve this paradox, the scientists proposed that Serrate must have an additional role besides promoting ATXR5/6 enzymatic activity in adding H3K27me1 marks.
After screening several potential candidates, they pinpointed RNA-Dependent RNA Polymerase 6, or RDR6. RDR6 is a protein that can reduce the transposon transcript amount once the transcripts are made. This regulation pathway is also called post-transcriptional gene silencing.
When the scientists introduced the mutation of RDR6 into the plants that lack the Serrate and ATXR5/6 genes, they found that transposon transcripts were again highly accumulated, as observed in the plants that only miss the ATXR5/6 genes.
"That means, Serrate protects transposon transcripts from being silenced by RDR6," Ma said. "Altogether, we found that a single protein has totally opposite functions in different regulation steps to control the net amount of the target."
"It totally makes sense if you take a close look at the function of transposons because they are important but also harmful to the host if their expression level is too high" Ma said. "Transposon genes must be tightly controlled by balanced forces to allow low but essential expression. I'm still very much impressed by the beautiful and elegant natural design that the plant uses a single protein to fine tune the gene expression level."
Zhang said Serrate and its homolog arsenic resistant gene 2 have a vital role in the biology of plants and humans.
In plants, missing of Serrate leads to reduced statue, deformed leaf and altered responses to environmental stresses. In humans, knock-out ARS2 results in embryonic lethality, and altered ARS2 expression is linked to several kinds of disorders in the bone marrow and neural stem cells.
Establishing a connection between Serrate and transposon silencing not only represents a significant advancement in the field, but also provides new ideas to improve agricultural traits and to tackle human diseases in the long run.
Mechanisms of Histone Modifications
Ludovica Vanzan , . Rabih Murr , in Handbook of Epigenetics (Second Edition) , 2017
The genome size is, on average, significantly larger in eukaryotes than in prokaryotes. This, on one hand contributed to the acquisition of a number of advantages. On the other hand, it increased the complexity of regulation of the activity, maintenance, and inheritance of the genomic material. To deal with this complexity, eukaryotic organisms acquired new proteins, called histones that not only allow packaging and protecting the DNA into a higher-order structure called chromatin, but can also regulate the accessibility to and the activity of different parts of the genome. This regulatory function is directed by multiple chemical modifications that can take place on the histones. Here, we introduce the most common histone modifications, the machineries that deposit and remove them, their distribution in the genome, and their role in key cellular processes, with a focus on transcriptional regulation and DNA repair.
Extended Data Figure 1 M. pusilla introner elements are in phase with nucleosome linker DNA, even without methylation.
Unmethylated regions (indicated by the line with arrowheads) are defined as containing no base positions with fractional methylation 0.5 or greater in a window starting from 50 bp upstream of the 5′ end of the introner element intron and continuing 234 bp downstream, which is 50 bp beyond the predominant M. pusilla introner element intron size of 184 bp (Fig. 2a). Mean values at each base position are shown for chromatin maps 12 aligned to the subset (7%) of introner element introns residing in unmethylated regions (dark grey and dark blue for nucleosomes centres and DNA methylation, respectively), compared with alignment to all introner element introns (light grey and light blue same data as in Fig. 1b for introner element introns). On the other hand, to assess whether introner elements could be in phase with methylated regions that are not also nucleosome linkers, we looked for introner elements that had both ends in methylated DNA regions 12 but not in nucleosome linkers, which gave 35 potential candidates (1% of introner elements). Manual inspection revealed that 34 of the 35 candidates apparently nonetheless have ends in nucleosome linkers, simply being missed by the filtering criteria we used for calling linkers. This leaves one candidate, indicating little evidence that DNA methylated regions are found at introner element ends, which are not also nucleosome linkers. Thus, unmethylated nucleosome linkers could be the primary determinant of introner element insertion in at least some cases, whereas we find virtually no evidence that methylated regions could be the primary determinant of introner element insertion without also being nucleosome linkers.
Extended Data Figure 2 A. anophagefferens introner elements insert into pre-existing nucleosome linkers.
a, Introner element (IE) introns are generally in phase with nucleosome positions, whereas other introns are not. DNA methylation 12 was aligned to the 5′ ends of introner element introns (dark blue) or other introns (light blue). We did not generate nucleosome data previously for A. anophagefferens but DNA methylation is a reliable indicator of linker locations 12 . b, Introner elements are in phase with the starts of genes, indicating insertion between pre-existing nucleosomes. The 5′ ends of introner element introns and DNA methylation 12 were aligned to gene starts. A kernel density estimate of introner element ends is displayed with peaks marked by vertical broken lines.
Extended Data Figure 3 Target site duplications (TSDs) at introner element introns.
a, c, Intron sequences contain directly repeated sequences at their ends. Each A. anophagefferens (a) and M. pusilla (c) intron 5′ and 3′ end is directly aligned in each possible offset from −10 to 10 bp apart. Positions relative to the 5′ splice site from 10 bp upstream to 10 bp downstream are shown. Introner element (IE) introns are shown on the left and other regular non-introner element introns are in the centre, and the differences obtained by subtracting the identity percentages of other introns from those of introner element introns are on the right. Each panel is separated by a vertical black line and a diagonally stepped black line to delineate different regions: the upper left region represents alignment of upstream exon versus 3′ intron end sequence the upper right represents 5′ intron end versus 3′ intron end the lower right represents 5′ intron end versus downstream exon and the lower left represents upstream exon versus downstream exon. The red arrowheads on the right indicate the offset with maximum average identity (0 in both cases). The red boxes in the right panels highlight the identified TSD length and position (see Supplementary Discussion). b, d, An example of an aligned 5′ (above) and 3′ (below) intron end of an introner element for the offset with maximum identity is shown in b for A. anophagefferens and d for M. pusilla. Exonic sequence is uppercase and boxed intronic is lowercase. Vertical lines show identities that are part of at least an identical 2-mer with the red lines corresponding to the boxed regions in a and c.
Extended Data Figure 4 Terminal inverted repeats (TIRs) in introner element introns.
a, c, Intron end sequences contain inverted repeats. Each A. anophagefferens (a) and M. pusilla (c) intron 5′ and reverse of the 3′ end is aligned in each possible offset from −30 to 30 bp apart. Positions relative to the 5′ splice site from 30 bp upstream to 30 bp downstream are shown. Introner element (IE) introns are shown on the left and other regular non-introner element introns are on the right. In each panel the upper left region represents upstream exon versus downstream exon sequence, the upper right represents 5′ intron end versus downstream exon, the lower right represents 5′ intron end versus 3′ intron end, and the lower left represents upstream exon versus 3′ intron end. The red arrowheads (right) indicate the offset with maximum average complementarity. b, d, An example of an aligned 5′ (top) and 3′ (bottom, reversed so that it is 3′ to 5′) end of an introner element intron for the offset with maximum complementarity is shown in b for A. anophagefferens (offset of +8) and d for M. pusilla (offset of −5). Exonic sequence is uppercase and boxed intronic is lowercase. Vertical lines show complementarities that are part of at least an identical 2-mer.
Extended Data Figure 5 Intron gain templated by nucleosomes and co-opted sequences.
Model for intron generation by introner elements acting as short non-autonomous DNA transposons that carry a splice site and insert between nucleosomes with co-option of the other splice site sequence.
Extended Data Figure 6 Diploid genomic sequence variation in a more recent isolate of A. anophagefferens.
a, Calling of sequence variation from genomic sequencing reads without an assumption of ploidy reveals a peak at an alternate allele fraction of approximately 0.5. The most likely scenario is that this A. anophagefferens isolate has a diploid genome. It is not physically plausible for it to have higher ploidy because that amount of chromatin could not fit into its extremely compact nucleus 12 . b, An example reference introner element (IE) is present within one allele and absent from the alternate allele. The locus is displayed as in Fig. 3a. The reference introner element is located in an annotated protein-coding gene with a 200-bp RNA sequencing-validated intron in the reference isolate. The alternate allele is probably exonic without an intron (broken lines), so that it encodes the same amino acid sequence. The TSD within the reference allele is 8 bp, immediately flanking the introner element TIRs. c, An example introner element not found within the reference allele is present within the alternate allele. The locus is displayed as in Fig. 3a. The alternate introner element is within an annotated protein-coding gene with a predicted 200-bp intron (broken lines). If the predicted intron is indeed spliced out of the RNA, then the alternate allele encodes the same amino acid sequence. The TSD within the alternate allele is 8 bp, immediately flanking the introner element TIRs.
Extended Data Figure 7 Splice site sequences.
Logos for the 10 bp upstream and downstream of 5′ and 3′ splice sites for introner element and other introns are shown for each organism. The rectangles show exonic positions. The core splice sites are GY (Y is C or T) and AG. Introner elements (IEs) combined with co-opted exonic sequence that is duplicated (Fig. 3) to generate particular sequences that extend beyond the core sites (bracketed). Specifically, this results in a predominance of AG|GY sequences (| denotes the position of splicing that ultimately occurs) at 5′ splice sites in M. pusilla introner element introns and 3′ splice sites in A. anophagefferens introner element introns. Similar respective sequences are observed in other introns in each organism: G|GT for M. pusilla 5′ splice sites and AG|G for A. anophagefferens 3′ splice sites. In non-introner element introns, these sequences have been under selection for long periods of time to promote RNA splicing, revealing the sequences extending beyond core sites that probably contribute to optimal splicing in each organism. The similarity of introner element intron splice sites to other intron splice sites thus suggests that introner elements in each organism generate new introns that are spliced reasonably well.
Extended Data Figure 8 Most introner elements are located in genes expressing low to average RNA levels.
Distributions of detectable RNA levels of all transcripts (black) and only those containing at least one introner element (IE-containing, green) are shown as measured by RNA sequencing. Box plots indicate the median, first and third quartiles with whiskers extending up to data 1.5 times the interquartile range away from the box. For M. pusilla, introner element-containing gene expression does not differ significantly from that of all genes, P = 0.59. For A. anophagefferens, introner element-containing gene expression is slightly lower than that of all genes, P = 0.041.