Chromosome size without heterochromatin

Chromosome size without heterochromatin

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Im doing different analysis of the human chromosomes and diffent loci, however when using different databases, the heterochromatin structures are not part of the human genomes. I know that heterochromatin is a more dense structure within the DNA and stained differently and are not part of the different reference genomes.

However all the information regarding the human chromosome sizes, I can find all includes the heterochromatin regions.

So i would like to know, is there anywhere i can find the length of the human chromosomes without the heterochromatin structures?

Thanks for your time and help.

There are a couple of caveats that need to be addressed before answering this question:

  1. We don't know all of the regions in the human genome that can be heterochromatic. In some cases large chunks of heterochromatin are simply missing from the genome assembly. For a somewhat outdated but still informative review of the topic see here.
  2. Which regions are heterochromatic differs from cell to cell. For example, I would recommend reading about the differences between constitutive and facultative heterochromatin at wikipedia.
  3. Heterochromatic regions of the genome are very important biologically and not taking them into account could be a mistake, though it depends on your motivations.

The human genome project initially estimated that 92% of their original genome build is euchromatic, 2.85 Gbp total. However that original genome build has been substantially updated. That estimate is close enough for many purposes, probably. But we've learned a lot about genome structure and biology since that time.

We do have a lot of data that is highly correlated with what we think of as heterochromatin. The UCSC human genome browser has a lot of information about chromatin modifications (I would look at the ENCODE analysis section here). In principle you can make educated guesses about regions of heterochromatin based on chromatin modifications. This will be somewhat noisy, and you will have to make decisions about which cell types are relevant, as they will have somewhat different regions.

For more information about which chromatin modifications might be relevant, you might start looking here. You will likely need to analyze some data yourself to find a more precise answer.

Assembly and characterization of heterochromatin and euchromatin on human artificial chromosomes

Human centromere regions are characterized by the presence of alpha-satellite DNA, replication late in S phase and a heterochromatic appearance. Recent models propose that the centromere is organized into conserved chromatin domains in which chromatin containing CenH3 (centromere-specific H3 variant) at the functional centromere (kinetochore) forms within regions of heterochromatin. To address these models, we assayed formation of heterochromatin and euchromatin on de novo human artificial chromosomes containing alpha-satellite DNA. We also examined the relationship between chromatin composition and replication timing of artificial chromosomes.


Heterochromatin factors (histone H3 lysine 9 methylation and HP1α) were enriched on artificial chromosomes estimated to be larger than 3 Mb in size but depleted on those smaller than 3 Mb. All artificial chromosomes assembled markers of euchromatin (histone H3 lysine 4 methylation), which may partly reflect marker-gene expression. Replication timing studies revealed that the replication timing of artificial chromosomes was heterogeneous. Heterochromatin-depleted artificial chromosomes replicated in early S phase whereas heterochromatin-enriched artificial chromosomes replicated in mid to late S phase.


Centromere regions on human artificial chromosomes and host chromosomes have similar amounts of CenH3 but exhibit highly varying degrees of heterochromatin, suggesting that only a small amount of heterochromatin may be required for centromere function. The formation of euchromatin on all artificial chromosomes demonstrates that they can provide a chromosome context suitable for gene expression. The earlier replication of the heterochromatin-depleted artificial chromosomes suggests that replication late in S phase is not a requirement for centromere function.


How are genomes organized in the nucleus, and what is the role of genome organization on cellular functions? These fundamental questions in cell biology are attracting increased attention as the genomes of higher eukaryotes are being sequenced. Diverse models, ranging from highly random to highly organized, have been proposed for the organization of interphase chromatin (for reviews see Manuelidis, 1990Haaf and Schmid, 1991 Cremer et al., 1993). Recent evidence suggests that interphase chromatin is organized in large loops, several megabase pair in size (Sachs et al., 1995 Yokota et al., 1995 Ostashevsky, 1998). While within each loop chromatin is randomly folded, specific loop-attachment sites may impose a constrained backbone structure (Yokota et al., 1995Marshall et al., 1997 Ostashevsky, 1998, 2000 Cremeret al., 2000).

At present, it is well established that both mitotic chromosomes and interphase chromatin are composed of distinct functional domains (for recent reviews, see Cockell and Gasser, 1999 Belmont et al., 1999 Cremer et al., 2000). Each domain occupies a specific spatial position and replicates at a precise time during S phase. In metaphase chromosomes, the domains are identified as alternate transverse bands along the chromosome length (reviewed bySumner, 1990). Shortly after mitosis, the chromosomal domains decondense and are repositioned in the nucleus, where they are designated as either euchromatin or heterochromatin (for reviews seeManuelidis, 1990 Haaf and Schmid, 1991 Craig and Bickmore, 1993). Heterochromatin represents chromatin that remains condensed throughout the cell cycle except during its replication, which occurs late in S phase. Heterochromatin includes constitutive heterochromatin, which is almost entirely composed of noncoding, tandemly repeated, satellite DNA sequences, and facultative heterochromatin, which mainly consists of potentially transcribable genes. Constitutively heterochromatic regions on metaphase chromosomes are designated C bands and are mostly localized at or adjacent to centromeric regions, whereas facultative heterochromatin resides in so-called G-dark bands (Craig and Bickmore, 1993). The G-dark bands comprise tissue-specific genes that are transcribed only in selected cell types (Manuelidis, 1990). Housekeeping genes, which are early replicating and actively transcribed in almost all cells, reside on G-light bands (also called R bands). During interphase, the vast majority of late replicating bands from most (if not all) chromosomes are localized at the nuclear periphery, with a smaller fraction present around the nucleolus or scattered in the nucleoplasm. In contrast, early replicating G-light bands appear to spread throughout the nuclear interior (Ferreiraet al., 1997 Sadoni et al., 1999). Both intranuclear repositioning and replication timing of chromosomal domains are established in early G1 phase of the cell cycle, suggesting that spatial distribution within the nucleus is tightly coupled to the establishment of a replication timing program (Dimitrova and Gilbert, 1999). Given that during cellular differentiation there are changes in replication timing, which are often coupled to changes in transcriptional activity (see Dimitrova and Gilbert, 1999 and references therein), a key issue is whether spatial position within the nucleus plays an epigenetic regulatory role in gene expression.

Since long, heterochromatin is known to inactivate genes. For example, when a normally euchromatic gene is juxtaposed to heterochromatin by chromosome rearrangement, it can become transcriptionally silenced in a fraction of the cells. The mosaic expression of the transposed gene is called heterochromatic position-effect variegation, PEV (reviewed byWakimoto, 1998). Although the classical explanation for PEV invokes the spreading of the heterochromatin state along the length of the chromosome into neighboring genes, there are cases of PEV for which a “trans-inactivation” mechanism has been proposed. A particularly well characterized example occurs when the insertion of a large block of heterochromatin into the coding sequence of the eye-color gene brown in Drosophila causes variegated inactivation of a normal copy of the gene present on a homologous chromosome. At defined stages of development, this insertion is shown to physically associate with centromeric chromatin on the same chromosome in a stochastic manner (Dernburg et al., 1996). Thus, in this case the association with heterochromatin responsible for variegation results from long-distance looping rather than from linear proximity along the chromosome. Additional examples of silencingtrans-interactions between tissue-specific genes and centromeric heterochromatin were recently described in mammalian lymphoid cells (Brown et al., 1997, 1999).

Despite current evidence indicating that the intranuclear positioning of genomic loci relative to centromeric heterochromatin affects their transcriptional activity, very little is known about the principles governing the spatial distribution within the nucleus of centromeres per se. During interphase, centromeric heterochromatin is predominantly located either at the nuclear periphery or around the nucleolus (reviewed in Haaf and Schmid, 1991 Pluta et al., 1995). Is this distribution stochastic, or are there defined positional constraints for individual centromeres? Clearly, the centromeres of chromosomes that contain genes coding for rRNA (i.e., the nucleolar organizing region or NOR) are expected to associate with the nucleolus. However, the question remains for the centromeres of chromosomes without NOR. Are these randomly distributed between the nuclear periphery and the nucleolus? To address this question, we used fluorescence in situ hybridization, to differentially tag the centromeric heterochromatin of 15 human chromosomes, and confocal microscopy to determine their three-dimensional distribution pattern within the nucleus of quiescent lymphoid cells. Our results reveal that the positioning of centromeric heterochromatin relative to the nuclear envelope and the nucleolus tends to be specific for each chromosome. Most important, centromeric positioning can be predicted, taking into account the abundance of G-dark bands in the same chromosome. We propose a model for positioning of centromeres during interphase based on intra- and interchromosomal interactions between constitutive and facultative heterochromatin domains in the nucleus.

The Two Types of Heterochromatin: Constitutive and Facultative

It is not surprising that the way in which the DNA is packaged is related to the cell cycle. When the DNA needs to be copied (replicated) and proteins need to be synthesized (transcription and then translation), the DNA is found in the euchromatin form. When genes do not need to be replicated and transcribed, the DNA is in the heterochromatin form. Furthermore, when the DNA is in the active chromosome form, the cell is in the interphase stage of the cell cycle, and when it is in the metaphase chromosome form, the cell is in dividing, i.e. it is in the mitosis or meiosis stage.

In line with this, it has been proposed that regulating the way in which the DNA is packaged is a way of regulating gene expression. Therefore, housekeeping genes that maintain the functions and survival of the cell are always in the euchromatin form, whereas those that do not need to be expressed are in the heterochromatin form. The means by which this is achieved is by modification of the histone tail, a part of the histones that can be acetylated or methylated. Modifying the histone tail results in changes in the packaging of the DNA. For instance, hypoacetylation on the histone tail is associated with the heterochromatic conformation, whereby DNA is not exposed and consequently gene transcription is prevented.

1. What does the presence of heterochromatin reveal?
A. That cells are transcriptionally active.
B. That cells are dividing.
C. That gene transcription is not taking place.
D. That DNA is exposed to polymerases and other regulatory proteins.

2. What are the two main differences between constitutive and facultative heterochromatin?
A. Constitutive heterochromatin is reversible and has LINE sequences, whereas facultative heterochromatin is stable and has satellite DNA.
B. Constitutive heterochromatin is stable and has LINE sequences, whereas facultative heterochromatin is reversible and has satellite DNA.
C. Constitutive heterochromatin is reversible and has satellite DNA, whereas facultative heterochromatin is stable and has LINE sequences.
D. Constitutive heterochromatin is stable and has satellite DNA, whereas facultative heterochromatin is reversible and has LINE sequences.

3. What is another name by which heterochromatin is known?
A. Beads-on-a-string
B. 30-nm fiber
C. Active chromesome
D. Metaphase chromesome


PREditOR (protein reading and editing of residues) effectively removes heterochromatin from pericentromeric regions

To manipulate the epigenetic status of defined chromatin classes, we designed a novel synthetic biology approach that allows us to tether chromatin E ditors to specific regions of the genome, protein reading and editing of residues (PREditOR). PREditOR is based on the use of fusion proteins consisting of three domains (Supplementary Figure 1a): (i) a R eader domain that recognizes specific epigenetic modifications, (ii) a fluorescent marker to follow the localization of the fusion protein and (iii) a chromatin E ditor that functions specifically at or near the tethering site. In order to analyse the role of pericetromeric heterochromatin on chromosome segregation, we fused the N-terminal chromodomain of H3K9-specific methyltransferase SUV39H1 (SUV39H1ΔSET) (a R eader of H3K9me3) to an EYFP marker (Fig. 1a, b). Removal of the SET domain ensures that this molecule functions solely as a R eader and not as an enzymatically active E ditor .

Tethering JMJD2D to heterochromatin decrease H3K9me3 levels. a Schematic of the PREdiTOR approach to tether chromatin modifiers to heterochromatin regions. b Schematic drawings of the SUV39H1ΔSET-EYFP fusion constructs. c Diagram of the experimental design. d Representative immunofluorescence images of HeLa cells expressing the indicated SUV39H1ΔSET-EYFP fusion proteins and stained for H3K9me3. Scale bar 10 μm. e Quantification of fluorescence signals of H3K9me3 staining in individual transfected cells as in d plotted as arbitrary fluorescence units (A.F.U). Solid bars indicate the medians of three independent experiments and error bars represent the standard error of the mean (s.e.m). Asterisks indicate statistical significant differences compared to EYFP (*P < 0.05 **P < 0.001 Student’s t test)

Immunofluorescence analysis after expression of the SUV39H1ΔSET-EYFP fusion protein in HeLa cells showed colocalization with H3K9me3 and CENP-B foci (Fig. 1d and Supplementary Figure 1b). Thus, this fusion protein targets specifically to pericentromeric heterochromatin. SUV39H1ΔSET-EYFP is released from chromatin in early mitosis and rebinds later in anaphase (Supplementary Figure 1c). This is most likely due to a methyl/phos switch effect caused by phosphorylation of histone H3 on Serine 10 catalysed by Aurora B kinase (Fischle et al. 2005 Hirota et al. 2005).

As an E ditor to remove H3K9me3 from pericentromeric regions, we fused SUV39H1ΔSET-EYFP to the H3K9me3-specific demethylase JMJD2D/KDM4 (SUV39H1ΔSET-EYFP-JMJD2D WT ) (Fig. 1b, c). Two control molecules were also constructed (Fig. 1b, c). The first was a catalytically dead mutant of JMJD2D carrying a mutation in its jmjC-enzymatic domain fused to SUV39H1ΔSET-EYFP (SUV39H1ΔSET-EYFP-JMJD2D D195A ). This molecule targets to heterochromatin but cannot demethylate H3K9. The second was a binding-deficient mutant of SUV39H1ΔSET bearing two mutations of its chromatin-binding domain fused to wild type JMJD2D (SUV39H1ΔSET W61AY67A -EYFP-JMJD2D WT ). This molecule has an active demethylase but cannot target specifically to heterochromatin.

Transient expression of SUV39H1ΔSET-EYFP-JMJD2D WT in HeLa cells for 48 h efficiently removed H3K9me3 from pericentromeric loci. Immunofluorescence analysis revealed significantly decreased levels of H3K9me3 levels in cells expressing SUV39H1ΔSET-EYFP-JMJD2D WT compared to the transfection and tethering controls (EYFP and SUV39H1ΔSET-EYFP, respectively) (Fig. 1d, e). Importantly, no differences in H3K9me3 levels were observed after expressing either the catalytically dead mutant (SUV39H1ΔSET-EYFP-JMJD2D D195A ) or the binding-deficient mutant (SUV39H1ΔSET W61AY67A -EYFP-JMJD2D WT ) (Fig. 1d, e). Apparently, JMJD2D only efficiently demethylates H3K9me3 when it is tethered to heterochromatic regions. Consistent with these results, immunofluorescence staining for HP1α, another hallmark of heterochromatin, revealed a strongly significant decrease in HP1α foci in cells expressing SUV39H1ΔSET-EYFP-JMJD2D WT compared with cells expressing the other control constructs (Supplementary Figure 1d and e).

We also investigated whether chromosomes overall looked more decondensed after expression of SUV39H1ΔSET-EYFP-JMJD2D WT fusion protein. Although there did appear to be some slight decompaction in live images, when chromosomes were fixed and spreads prepared, no significant differences were seen.

We conclude that PREditOR can effectively remove H3K9me3 and specifically disrupt heterochromatin, releasing downstream heterochromatin R eaders such as HP1α. Importantly, JMJD2D only removes heterochromatin when it is tethered to the pericentromeric regions of chromosomes.

Heterochromatin removal causes a mitotic accumulation and chromosome segregation defects

To analyse the effects of heterochromatin removal on cell division, we expressed the different SUV39H1ΔSET-EYFP fusion proteins in HeLa cells for 48 h and examined their effects on mitosis. Our results show a threefold increase in the mitotic index of cells expressing SUV39H1ΔSET-EYFP-JMJD2D WT compared to cells expressing the control fusion proteins (Fig. 2a). The control results demonstrate that SUV39H1ΔSET-EYFP binding to pericentromeric regions does not interfere with mitotic progression and that the increase in mitotic index is due to the demethylase activity of JMJD2D.

Heterochromatin removal disrupts mitosis and chromosome segregation. a Analysis of the frequency of mitotic cells after expressing the indicated SUV39H1ΔSET-EYFP fusion proteins. Data represent the mean and standard error of the mean (s.e.m) of five independent experiments. b Analysis of the frequency of every individual mitotic phase in relation of the total number of mitoses. Data represents the mean and the standard error of the mean (s.e.m) of six independent experiments. c Representative IF images showing mitotic abnormalities in HeLa cells. Images show examples of chromosome bridges (top), lagging chromosomes (middle) and uncongressed chromosomes (bottom). d Analysis of the frequency of abnormal mitoses after expressing the indicated SUV39H1ΔSET-EYFP fusion proteins. Data represent the mean and standard error of the mean (s.e.m) of four independent experiments. e Analysis of the frequency of mitotic cells showing bridges or lagging chromosomes after expression of the indicated SUV39H1ΔSET-EYFP fusion proteins. Data represent the mean and standard error of the mean (s.e.m) of four independent experiments. f Representative IF images showing interphase abnormalities in HeLa cells. Images show a cell with micronucleus (top), and a binucleate cell (bottom). g Quantification of interphase abnormalities after expressing the indicated SUV39H1ΔSET-EYFP fusion proteins. Data represent the mean and standard error of the mean (s.e.m) of three independent experiments. Asterisks indicate statistical significant differences compared to EYFP (*P < 0.05, **P < 0.01, ***P < 0.001 Student’s t test)

We observed significantly decreased levels of prophase, metaphase and anaphase cells expressing SUV39H1ΔSET-EYFP-JMJD2D WT compared to controls (Fig. 2b). No difference was observed in the frequency of cells in telophase, though a small increase was seen for cells in cytokinesis.

In order to analyse the effects of heterochromatin removal on chromosome segregation, we quantified the frequencies of mitotic abnormalities in HeLa cells expressing the different SUV39H1ΔSET-EYFP fusion proteins. We quantified the frequencies of anaphase bridges, lagging chromosomes, uncongressed chromosomes in metaphase and malformed spindles. Overall, cells expressing SUV39H1ΔSET-EYFP-JMJD2D WT showed a significantly increased frequency of abnormal mitosis compared to cells expressing the other vectors (40 vs 8–15%, respectively) (Fig. 2c, d). In particular, we observed significantly increased frequencies of lagging chromosomes and bridges in cells expressing SUV39H1ΔSET-EYFP-JMJD2D WT (Fig. 2e). Although there was no significant increase in multipolar spindles as judged by pericentrin staining, we did see a high frequency of other spindle malformations (Supplementary Figure 2). Consistent with the increased frequencies of mitotic abnormalities, we also observed significantly increased frequencies of micronuclei, a sensitive reporter for chromosome segregation defects, in interphase cells expressing SUV39H1ΔSET-EYFP-JMJD2D WT compared with controls (13 vs 4–6%) (Fig. 2f, g).

These data suggest that heterochromatin is necessary for correct chromosome segregation during mitosis and that its removal interferes with mitotic progression and chromosome segregation fidelity.

Perturbing heterochromatin leads to centromere defects

Centromeres direct the assembly the kinetochore, a multi-protein complex that binds to microtubules and directs chromosome segregation (Fukagawa and Earnshaw 2014). However, some kinetochore proteins, including the Mis12 complex, have been reported to bind to the heterochromatin flanking the core centrochromatin (Obuse et al. 2004). In view of the chromosome segregation defects reported above, we asked whether heterochromatin removal is associated with kinetochore defects.

Immunofluorescence staining for the outer kinetochore protein HEC1 was performed after expression of the different SUV39H1ΔSET-EYFP fusion proteins for 48 h (the time point at which we observed significant defects on chromosome segregation). We observed mild but significant decreases in levels of HEC1 in cells expressing all of the SUV39H1ΔSET-EYFP vectors compared to the transfection control (Fig. 3a, b). This suggests that the binding of SUV39H1ΔSET-EYFP alone has an effect on kinetochore structure.

Heterochromatin removal leads to centromere defects. a Representative immunofluorescence images of HeLa cells expressing the indicated SUV39H1ΔSET-EYFP fusion proteins and stained for HEC1. Scale bar 10 μm. b Quantification of fluorescence signals of HEC1 staining in individual cells transfected as in (a) plotted as arbitrary fluorescence units (A.F.U). Solid bars indicate the medians of two independent experiments and error bars represent the standard error of the mean (s.e.m). c Representative immunofluorescence images showing prometaphase cells with localized (top) or dispersed (bottom) SGO1, using CENP-A as centromere marker. Scale bar 10 μm. d Analysis of the frequency of cells showing localized or dispersed SGO1 staining after expressing the indicated SUV39H1ΔSET-EYFP fusion proteins. Data represent the mean and standard error of the mean (s.e.m) of three independent experiments

Although all constructs showed statistically significant decreased levels of HEC1 compared with cells expressing EYFP, the greatest decrease was observed in cells expressing SUV39H1ΔSET-EYFP-JMJD2D WT (−49%). Lesser decreases were observed in cells expressing SUV39H1ΔSET-EYFP (−36%), SUV39H1ΔSET-EYFP-JMJD2D D195A (−24%) and SUV39H1ΔSET W61AY67A -EYFP-JMJD2D WT (−9%). Therefore, perturbing heterochromatin has a deleterious effect on kinetochore structure. The SUV39H1ΔSET module may exert a dominant-negative effect by competing with R eaders that bind to H3K9me3. This is consistent with the observation that the SUV39H1ΔSET binding mutant exhibited the mildest phenotype.

Pericentromeric heterochromatin has been associated with the maintenance of cohesin in metaphase (Nonaka et al. 2002). After prophase, cohesin complexes are removed from the chromosome arms, but are retained at centromeres as a result of the activity of Shugoshin 1 (SGO1) (Losada et al. 2002). Given previous links between heterochromatin and cohesin in S. pombe (Nonaka et al. 2002), we analysed the localization of SGO1 after expressing the different SUV39H1ΔSET-EYFP fusion proteins in HeLa cells. In transfection controls, SGO1 showed a clear centromeric localization in 95% of the cells (Fig. 3c, d). Expression of the different SUV39H1ΔSET-EYFP proteins resulted in significant increases in the frequency of cells with SGO1 dispersed on chromosome arms (Fig. 3c, d). Thus, SUV39H1ΔSET-EYFP binding to pericentromeric heterochromatin perturbs SGO1 centromeric localization. As was the case for HEC1 staining, cells expressing SUV39H1ΔSET-EYFP-JMJD2D WT more frequently exhibited SGO1 localization defects than did cells expressing other SUV39H1ΔSET-EYFP controls (Fig. 3c, d).

We conclude that SUV39H1ΔSET-EYFP fusion proteins binding to pericentromeres generate mild defects on the kinetochore and SGO1. However, these defects are consistently higher after removing heterochromatin.

Heterochromatin cooperates with condensin to maintain centromeric stiffness

We and others previously showed that the condensin complex is important for maintaining the rigidity of the centromere (Gerlich et al. 2006 Ribeiro et al. 2009 Jaqaman et al. 2010). We hypothesized that condensin might act by regulating the compliance of centromeric heterochromatin (Ribeiro et al. 2009). To test the effect of removing heterochromatin on centromere stiffness, we expressed the various SUV39H1ΔSET-EYFP fusion proteins in HeLa cells for 48 h and analysed the distances between sister kinetochores on metaphase chromosomes. We observed a significant increase in this distance after expressing SUV39H1ΔSET-EYFP-JMJD2D WT compared with controls (Fig. 4a, b). This supports the notion that pericentromeric heterochromatin has a role in maintaining centromeric stiffness.

Heterochromatin is necessary to maintain the stiffness of the centromere in metaphase. a Representative immunofluorescence images of HeLa cells expressing the indicated SUV39H1ΔSET-EYFP fusion proteins and stained for CENP-C and Tubulin. b Quantification of intercentromeric distances in chromosomes under tension after expressing the indicated SUV39H1ΔSET-EYFP fusion proteins. Data represent the mean and standard error of the mean (s.e.m) of three independent experiments. c Immunoblot of whole HeLa cell protein extract transfected with the indicated siRNA and DNAs. Immunoblot for SMC2 with Tubulin as a loading control. d Representative immunofluorescence images of HeLa cells expressing the indicated SUV39H1ΔSET-EYFP fusion proteins and transfected with the indicated siRNA. e Quantification of intercentromeric distances in chromosomes under tension after expressing the indicated SUV39H1ΔSET-EYFP fusion proteins and siRNAs. Data represent the mean and standard error of the mean (s.e.m) of three independent experiments. Asterisks indicate statistical significant differences compared to EYFP (*P < 0.05, **P < 0.01, ***P < 0.001 Student’s t test)

In order to investigate our hypothesis that there is an interaction between condensin and heterochromatin in maintaining centromeric stiffness, we partly depleted SMC2 in HeLa cells using published siRNAs (Gerlich et al. 2006). Western blot analysis showed a 61% decrease in SMC2 levels after siRNA transfection (Supplementary Figure 3a). This was confirmed by immunofluorescence analysis, which showed a reduction of SMC2 levels on chromosomes compared with the control siRNA (Supplementary Figure 3b). Although 39% of the SMC2 remained in cells under these conditions, we observed the characteristic phenotypes of condensin-depleted cells, including dramatic changes in chromosome morphology, increased frequencies of lagging chromosomes and chromosome bridges (Supplementary Figure 3b and c).

Once the conditions for SMC2 depletion with siRNA were established, we analysed the intercentromeric distances of metaphase chromosomes after expressing either SUV39H1ΔSET-EYFP or SUV39H1ΔSET-EYFP-JMJD2D WT in the presence or absence of SMC2 depletion (Fig. 4c). Consistent with previous results from our group (Ribeiro et al. 2009), we observed a strong increase in intercentromeric distances in cells depleted of SMC2 compared with those transfected with the control siRNA (Fig. 4d, e). Strikingly, our analysis showed further significant increases of intercentromeric distances in cells expressing SUV39H1ΔSET-EYFP-JMJD2D WT compared with controls expressing SUV39H1ΔSET-EYFP. This additional effect upon removal of heterochromatin was seen both in the presence and absence of SMC2 (Fig. 4d, e).

These results show that heterochromatin cooperates with condensin to maintain centromeric stiffness. However, the additive nature of the observed effect suggests that condensin and heterochromatin make at least partly independent contributions.

Heterochromatin is essential for proper chromosome passenger complex localization

The chromosome passenger complex (CPC) of Survivin, INCENP, Borealin and its catalytic subunit Aurora B Kinase localizes to different targets during mitosis, where it regulates key mitotic events (Carmena et al. 2012). In early mitosis, the CPC is localized at inner centromeres, where it ensures that kinetochore-microtubule attachments are correct and regulates the spindle assembly checkpoint. During anaphase, it transfers to the midzone where it regulates the completion of cytokinesis (Fig. 5a) (Carmena et al. 2012).

Heterochromatin removal disrupts chromosomal passenger localization in mitosis. a, b Representative immunofluorescence images of HeLa cells expressing SUV39H1ΔSET-EYFP (a) or SUV39H1ΔSET-EYFP-JMJD2D WT (b) fusion protein and stained for Survivin and Tubulin. Scale bar 10 μm. c Analysis of the frequency of cells showing dispersed CPC in prometaphase and metaphase after expressing the indicated SUV39H1ΔSET fusion proteins. Data represent the mean and standard error of the mean (s.e.m) of two independent experiments. Asterisks indicate statistical significant differences compared to EYFP (*P < 0.05, **P < 0.01 Student’s t test)

It has been reported that centromeric HP1 targets the CPC to centromeres in early mitosis (Ainsztein et al. 1998 Liu et al. 2014). In order to study the role of heterochromatin on CPC localization at centromeres, we expressed the different SUV39H1ΔSET-EYFP vectors in HeLa cells for 48 h and analysed the localization of the CPC by staining for Survivin (Fig. 5a, b). In control cells expressing SUV39H1ΔSET-EYFP, the CPC concentrates at centromeres during prometaphase (Fig. 5a, c). Strikingly, our immunofluorescence analysis of cells expressing SUV39H1ΔSET-EYFP-JMJD2D WT showed an increased frequency of cells with the CPC dispersed on the chromosome arms in early mitosis (Fig. 5b, c). Moreover, we observed defects in CPC transfer to the midzone in late mitosis (Fig. 5a, b, bottom panels). Expression of SUV39H1ΔSET-EYFP-JMJD2D WT led to an increased frequency of cells in late mitosis in which the CPC remained attached to chromosomes and failed to concentrate at the spindle midzone.

We conclude that heterochromatin is necessary for efficient CPC localization at centromeres and also for its transfer to the midzone in late mitosis.

Author information


Precursory Research for Embryonic Science and Technology (PRESTO) of Japan Science and Technology Agency (JST), National Institute of Genetics and The Graduate University for Advanced Studies, Mishima, 411-8540, Shizuoka, Japan

Tatsuo Fukagawa, Masahiro Nogami & Mitsuko Yoshikawa

Institute of Comprehensive Medical Science, Fujita Health University, Toyoake, 470-1101, Aichi, Japan

Masashi Ikeno & Tuneko Okazaki

Department of Biochemistry, Miyazaki Medical College, Kiyotake, 889-1692, Miyazaki, Japan

Yasunari Takami & Tatsuo Nakayama

Department of Biomedical Science, Institute of Regenerative Medicine and Biofunction, Graduate School of Medical Science, Tottori University, Nishimachi 86, Yonago, 683-8503, Tottori, Japan


The first human genome sequences were published in nearly complete draft form in February 2001 by the Human Genome Project [15] and Celera Corporation. [16] Completion of the Human Genome Project's sequencing effort was announced in 2004 with the publication of a draft genome sequence, leaving just 341 gaps in the sequence, representing highly-repetitive and other DNA that could not be sequenced with the technology available at the time. [8] The human genome was the first of all vertebrates to be sequenced to such near-completion, and as of 2018, the diploid genomes of over a million individual humans had been determined using next-generation sequencing. [17] In 2021 it was reported that the T2T consortium had filled in all of the gaps. Thus there came into existence a complete human genome with no gaps. [18]

These data are used worldwide in biomedical science, anthropology, forensics and other branches of science. Such genomic studies have led to advances in the diagnosis and treatment of diseases, and to new insights in many fields of biology, including human evolution.

In June 2016, scientists formally announced HGP-Write, a plan to synthesize the human genome. [19] [20]

Although the 'completion' of the human genome project was announced in 2001, [14] there remained hundreds of gaps, with about 5–10% of the total sequence remaining undetermined. The missing genetic information was mostly in repetitive heterochromatic regions and near the centromeres and telomeres, but also some gene-encoding euchromatic regions. [21] There remained 160 euchromatic gaps in 2015 when the sequences spanning another 50 formerly-unsequenced regions were determined. [22] Only in 2020 was the first truly complete telomere-to-telomere sequence of a human chromosome determined, namely of the X chromosome. [23]

The total length of the human reference genome, that does not represent the sequence of any specific individual, is over 3 billion base pairs. The genome is organized into 22 paired chromosomes, termed autosomes, plus the 23rd pair of sex chromosomes (XX) in the female, and (XY) in the male. These are all large linear DNA molecules contained within the cell nucleus. The genome also includes the mitochondrial DNA, a comparatively small circular molecule present in multiple copies in each the mitochondrion.

Human reference genome data, by chromosome [24]
Chromosome Length
Variations Protein-
miRNA rRNA snRNA snoRNA Misc
Links Centromere
1 85 248,956,422 12,151,146 2058 1220 1200 496 134 66 221 145 192 EBI 125 7.9
2 83 242,193,529 12,945,965 1309 1023 1037 375 115 40 161 117 176 EBI 93.3 16.2
3 67 198,295,559 10,638,715 1078 763 711 298 99 29 138 87 134 EBI 91 23
4 65 190,214,555 10,165,685 752 727 657 228 92 24 120 56 104 EBI 50.4 29.6
5 62 181,538,259 9,519,995 876 721 844 235 83 25 106 61 119 EBI 48.4 35.8
6 58 170,805,979 9,130,476 1048 801 639 234 81 26 111 73 105 EBI 61 41.6
7 54 159,345,973 8,613,298 989 885 605 208 90 24 90 76 143 EBI 59.9 47.1
8 50 145,138,636 8,221,520 677 613 735 214 80 28 86 52 82 EBI 45.6 52
9 48 138,394,717 6,590,811 786 661 491 190 69 19 66 51 96 EBI 49 56.3
10 46 133,797,422 7,223,944 733 568 579 204 64 32 87 56 89 EBI 40.2 60.9
11 46 135,086,622 7,535,370 1298 821 710 233 63 24 74 76 97 EBI 53.7 65.4
12 45 133,275,309 7,228,129 1034 617 848 227 72 27 106 62 115 EBI 35.8 70
13 39 114,364,328 5,082,574 327 372 397 104 42 16 45 34 75 EBI 17.9 73.4
14 36 107,043,718 4,865,950 830 523 533 239 92 10 65 97 79 EBI 17.6 76.4
15 35 101,991,189 4,515,076 613 510 639 250 78 13 63 136 93 EBI 19 79.3
16 31 90,338,345 5,101,702 873 465 799 187 52 32 53 58 51 EBI 36.6 82
17 28 83,257,441 4,614,972 1197 531 834 235 61 15 80 71 99 EBI 24 84.8
18 27 80,373,285 4,035,966 270 247 453 109 32 13 51 36 41 EBI 17.2 87.4
19 20 58,617,616 3,858,269 1472 512 628 179 110 13 29 31 61 EBI 26.5 89.3
20 21 64,444,167 3,439,621 544 249 384 131 57 15 46 37 68 EBI 27.5 91.4
21 16 46,709,983 2,049,697 234 185 305 71 16 5 21 19 24 EBI 13.2 92.6
22 17 50,818,468 2,135,311 488 324 357 78 31 5 23 23 62 EBI 14.7 93.8
X 53 156,040,895 5,753,881 842 874 271 258 128 22 85 64 100 EBI 60.6 99.1
Y 20 57,227,415 211,643 71 388 71 30 15 7 17 3 8 EBI 10.4 100
mtDNA 0.0054 16,569 929 13 0 0 24 0 2 0 0 0 EBI N/A 100
total 3,088,286,401 155,630,645 20412 14600 14727 5037 1756 532 1944 1521 2213

Original analysis published in the Ensembl database at the European Bioinformatics Institute (EBI) and Wellcome Trust Sanger Institute. Chromosome lengths estimated by multiplying the number of base pairs by 0.34 nanometers (distance between base pairs in the most common structure of the DNA double helix a recent estimate of human chromosome lengths based on updated data reports 205.00 cm for the diploid male genome and 208.23 cm for female, corresponding to weights of 6.41 and 6.51 picograms (pg), respectively [25] ). Number of proteins is based on the number of initial precursor mRNA transcripts, and does not include products of alternative pre-mRNA splicing, or modifications to protein structure that occur after translation.

Variations are unique DNA sequence differences that have been identified in the individual human genome sequences analyzed by Ensembl as of December 2016. The number of identified variations is expected to increase as further personal genomes are sequenced and analyzed. In addition to the gene content shown in this table, a large number of non-expressed functional sequences have been identified throughout the human genome (see below). Links open windows to the reference chromosome sequences in the EBI genome browser.

Small non-coding RNAs are RNAs of as many as 200 bases that do not have protein-coding potential. These include: microRNAs, or miRNAs (post-transcriptional regulators of gene expression), small nuclear RNAs, or snRNAs (the RNA components of spliceosomes), and small nucleolar RNAs, or snoRNA (involved in guiding chemical modifications to other RNA molecules). Long non-coding RNAs are RNA molecules longer than 200 bases that do not have protein-coding potential. These include: ribosomal RNAs, or rRNAs (the RNA components of ribosomes), and a variety of other long RNAs that are involved in regulation of gene expression, epigenetic modifications of DNA nucleotides and histone proteins, and regulation of the activity of protein-coding genes. Small discrepancies between total-small-ncRNA numbers and the numbers of specific types of small ncNRAs result from the former values being sourced from Ensembl release 87 and the latter from Ensembl release 68.

The number of genes in the human genome is not entirely clear because the function of numerous transcripts remains unclear. This is especially true for non-coding RNA. The number of protein-coding genes is better known but there are still on the order of 1,400 questionable genes which may or may not encode functional proteins, usually encoded by short open reading frames.

Discrepancies in human gene number estimates among different databases, as of July 2018 [26]
Gencode [27] Ensembl [28] Refseq [29] CHESS [30]
protein-coding genes 19,901 20,376 20,345 21,306
lncRNA genes 15,779 14,720 17,712 18,484
antisense RNA 5501 28 2694
miscellaneous RNA 2213 2222 13,899 4347
Pseudogenes 14,723 1740 15,952
total transcripts 203,835 203,903 154,484 328,827

Information content Edit

The haploid human genome (23 chromosomes) is about 3 billion base pairs long and contains around 30,000 genes. [31] Since every base pair can be coded by 2 bits, this is about 750 megabytes of data. An individual somatic (diploid) cell contains twice this amount, that is, about 6 billion base pairs. Men have fewer than women because the Y chromosome is about 57 million base pairs whereas the X is about 156 million. Since individual genomes vary in sequence by less than 1% from each other, the variations of a given human's genome from a common reference can be losslessly compressed to roughly 4 megabytes. [32]

The entropy rate of the genome differs significantly between coding and non-coding sequences. It is close to the maximum of 2 bits per base pair for the coding sequences (about 45 million base pairs), but less for the non-coding parts. It ranges between 1.5 and 1.9 bits per base pair for the individual chromosome, except for the Y-chromosome, which has an entropy rate below 0.9 bits per base pair. [33]

The content of the human genome is commonly divided into coding and noncoding DNA sequences. Coding DNA is defined as those sequences that can be transcribed into mRNA and translated into proteins during the human life cycle these sequences occupy only a small fraction of the genome (<2%). Noncoding DNA is made up of all of those sequences (ca. 98% of the genome) that are not used to encode proteins.

Some noncoding DNA contains genes for RNA molecules with important biological functions (noncoding RNA, for example ribosomal RNA and transfer RNA). The exploration of the function and evolutionary origin of noncoding DNA is an important goal of contemporary genome research, including the ENCODE (Encyclopedia of DNA Elements) project, which aims to survey the entire human genome, using a variety of experimental tools whose results are indicative of molecular activity.

Because non-coding DNA greatly outnumbers coding DNA, the concept of the sequenced genome has become a more focused analytical concept than the classical concept of the DNA-coding gene. [34] [35]

Protein-coding sequences represent the most widely studied and best understood component of the human genome. These sequences ultimately lead to the production of all human proteins, although several biological processes (e.g. DNA rearrangements and alternative pre-mRNA splicing) can lead to the production of many more unique proteins than the number of protein-coding genes. The complete modular protein-coding capacity of the genome is contained within the exome, and consists of DNA sequences encoded by exons that can be translated into proteins. Because of its biological importance, and the fact that it constitutes less than 2% of the genome, sequencing of the exome was the first major milepost of the Human Genome Project.

Number of protein-coding genes. About 20,000 human proteins have been annotated in databases such as Uniprot. [37] Historically, estimates for the number of protein genes have varied widely, ranging up to 2,000,000 in the late 1960s, [38] but several researchers pointed out in the early 1970s that the estimated mutational load from deleterious mutations placed an upper limit of approximately 40,000 for the total number of functional loci (this includes protein-coding and functional non-coding genes). [39] The number of human protein-coding genes is not significantly larger than that of many less complex organisms, such as the roundworm and the fruit fly. This difference may result from the extensive use of alternative pre-mRNA splicing in humans, which provides the ability to build a very large number of modular proteins through the selective incorporation of exons.

Protein-coding capacity per chromosome. Protein-coding genes are distributed unevenly across the chromosomes, ranging from a few dozen to more than 2000, with an especially high gene density within chromosomes 1, 11, and 19. Each chromosome contains various gene-rich and gene-poor regions, which may be correlated with chromosome bands and GC-content. [40] The significance of these nonrandom patterns of gene density is not well understood. [41]

Size of protein-coding genes. The size of protein-coding genes within the human genome shows enormous variability. For example, the gene for histone H1a (HIST1HIA) is relatively small and simple, lacking introns and encoding an 781 nucleotide-long mRNA that produces a 215 amino acid protein from its 648 nucleotide open reading frame. Dystrophin (DMD) was the largest protein-coding gene in the 2001 human reference genome, spanning a total of 2.2 million nucleotides, [42] while more recent systematic meta-analysis of updated human genome data identified an even larger protein-coding gene, RBFOX1 (RNA binding protein, fox-1 homolog 1), spanning a total of 2.47 million nucleotides. [43] Titin (TTN) has the longest coding sequence (114,414 nucleotides), the largest number of exons (363), [42] and the longest single exon (17,106 nucleotides). As estimated based on a curated set of protein-coding genes over the whole genome, the median size is 26,288 nucleotides (mean = 66,577), the median exon size, 133 nucleotides (mean = 309), the median number of exons, 8 (mean = 11), and the median encoded protein is 425 amino acids (mean = 553) in length. [43]

Examples of human protein-coding genes [44]
Protein Chrom Gene Length Exons Exon length Intron length Alt splicing
Breast cancer type 2 susceptibility protein 13 BRCA2 83,736 27 11,386 72,350 yes
Cystic fibrosis transmembrane conductance regulator 7 CFTR 202,881 27 4,440 198,441 yes
Cytochrome b MT MTCYB 1,140 1 1,140 0 no
Dystrophin X DMD 2,220,381 79 10,500 2,209,881 yes
Glyceraldehyde-3-phosphate dehydrogenase 12 GAPDH 4,444 9 1,425 3,019 yes
Hemoglobin beta subunit 11 HBB 1,605 3 626 979 no
Histone H1A 6 HIST1H1A 781 1 781 0 no
Titin 2 TTN 281,434 364 104,301 177,133 yes

Noncoding DNA is defined as all of the DNA sequences within a genome that are not found within protein-coding exons, and so are never represented within the amino acid sequence of expressed proteins. By this definition, more than 98% of the human genomes is composed of ncDNA.

Numerous classes of noncoding DNA have been identified, including genes for noncoding RNA (e.g. tRNA and rRNA), pseudogenes, introns, untranslated regions of mRNA, regulatory DNA sequences, repetitive DNA sequences, and sequences related to mobile genetic elements.

Numerous sequences that are included within genes are also defined as noncoding DNA. These include genes for noncoding RNA (e.g. tRNA, rRNA), and untranslated components of protein-coding genes (e.g. introns, and 5' and 3' untranslated regions of mRNA).

Protein-coding sequences (specifically, coding exons) constitute less than 1.5% of the human genome. [14] In addition, about 26% of the human genome is introns. [45] Aside from genes (exons and introns) and known regulatory sequences (8–20%), the human genome contains regions of noncoding DNA. The exact amount of noncoding DNA that plays a role in cell physiology has been hotly debated. Recent analysis by the ENCODE project indicates that 80% of the entire human genome is either transcribed, binds to regulatory proteins, or is associated with some other biochemical activity. [12]

It however remains controversial whether all of this biochemical activity contributes to cell physiology, or whether a substantial portion of this is the result transcriptional and biochemical noise, which must be actively filtered out by the organism. [46] Excluding protein-coding sequences, introns, and regulatory regions, much of the non-coding DNA is composed of: Many DNA sequences that do not play a role in gene expression have important biological functions. Comparative genomics studies indicate that about 5% of the genome contains sequences of noncoding DNA that are highly conserved, sometimes on time-scales representing hundreds of millions of years, implying that these noncoding regions are under strong evolutionary pressure and positive selection. [47]

Many of these sequences regulate the structure of chromosomes by limiting the regions of heterochromatin formation and regulating structural features of the chromosomes, such as the telomeres and centromeres. Other noncoding regions serve as origins of DNA replication. Finally several regions are transcribed into functional noncoding RNA that regulate the expression of protein-coding genes (for example [48] ), mRNA translation and stability (see miRNA), chromatin structure (including histone modifications, for example [49] ), DNA methylation (for example [50] ), DNA recombination (for example [51] ), and cross-regulate other noncoding RNAs (for example [52] ). It is also likely that many transcribed noncoding regions do not serve any role and that this transcription is the product of non-specific RNA Polymerase activity. [46]

Pseudogenes Edit

Pseudogenes are inactive copies of protein-coding genes, often generated by gene duplication, that have become nonfunctional through the accumulation of inactivating mutations. The number of pseudogenes in the human genome is on the order of 13,000, [53] and in some chromosomes is nearly the same as the number of functional protein-coding genes. Gene duplication is a major mechanism through which new genetic material is generated during molecular evolution.

For example, the olfactory receptor gene family is one of the best-documented examples of pseudogenes in the human genome. More than 60 percent of the genes in this family are non-functional pseudogenes in humans. By comparison, only 20 percent of genes in the mouse olfactory receptor gene family are pseudogenes. Research suggests that this is a species-specific characteristic, as the most closely related primates all have proportionally fewer pseudogenes. This genetic discovery helps to explain the less acute sense of smell in humans relative to other mammals. [54]

Genes for noncoding RNA (ncRNA) Edit

Noncoding RNA molecules play many essential roles in cells, especially in the many reactions of protein synthesis and RNA processing. Noncoding RNA include tRNA, ribosomal RNA, microRNA, snRNA and other non-coding RNA genes including about 60,000 long non-coding RNAs (lncRNAs). [12] [55] [56] [57] Although the number of reported lncRNA genes continues to rise and the exact number in the human genome is yet to be defined, many of them are argued to be non-functional. [58]

Many ncRNAs are critical elements in gene regulation and expression. Noncoding RNA also contributes to epigenetics, transcription, RNA splicing, and the translational machinery. The role of RNA in genetic regulation and disease offers a new potential level of unexplored genomic complexity. [59]

Introns and untranslated regions of mRNA Edit

In addition to the ncRNA molecules that are encoded by discrete genes, the initial transcripts of protein coding genes usually contain extensive noncoding sequences, in the form of introns, 5'-untranslated regions (5'-UTR), and 3'-untranslated regions (3'-UTR). Within most protein-coding genes of the human genome, the length of intron sequences is 10- to 100-times the length of exon sequences.

Regulatory DNA sequences Edit

The human genome has many different regulatory sequences which are crucial to controlling gene expression. Conservative estimates indicate that these sequences make up 8% of the genome, [60] however extrapolations from the ENCODE project give that 20 [61] -40% [62] of the genome is gene regulatory sequence. Some types of non-coding DNA are genetic "switches" that do not encode proteins, but do regulate when and where genes are expressed (called enhancers). [63]

Regulatory sequences have been known since the late 1960s. [64] The first identification of regulatory sequences in the human genome relied on recombinant DNA technology. [65] Later with the advent of genomic sequencing, the identification of these sequences could be inferred by evolutionary conservation. The evolutionary branch between the primates and mouse, for example, occurred 70–90 million years ago. [66] So computer comparisons of gene sequences that identify conserved non-coding sequences will be an indication of their importance in duties such as gene regulation. [67]

Other genomes have been sequenced with the same intention of aiding conservation-guided methods, for exampled the pufferfish genome. [68] However, regulatory sequences disappear and re-evolve during evolution at a high rate. [69] [70] [71]

As of 2012, the efforts have shifted toward finding interactions between DNA and regulatory proteins by the technique ChIP-Seq, or gaps where the DNA is not packaged by histones (DNase hypersensitive sites), both of which tell where there are active regulatory sequences in the investigated cell type. [60]

Repetitive DNA sequences Edit

Repetitive DNA sequences comprise approximately 50% of the human genome. [72]

About 8% of the human genome consists of tandem DNA arrays or tandem repeats, low complexity repeat sequences that have multiple adjacent copies (e.g. "CAGCAGCAG. "). [73] The tandem sequences may be of variable lengths, from two nucleotides to tens of nucleotides. These sequences are highly variable, even among closely related individuals, and so are used for genealogical DNA testing and forensic DNA analysis. [74]

Repeated sequences of fewer than ten nucleotides (e.g. the dinucleotide repeat (AC)n) are termed microsatellite sequences. Among the microsatellite sequences, trinucleotide repeats are of particular importance, as sometimes occur within coding regions of genes for proteins and may lead to genetic disorders. For example, Huntington's disease results from an expansion of the trinucleotide repeat (CAG)n within the Huntingtin gene on human chromosome 4. Telomeres (the ends of linear chromosomes) end with a microsatellite hexanucleotide repeat of the sequence (TTAGGG)n.

Tandem repeats of longer sequences (arrays of repeated sequences 10–60 nucleotides long) are termed minisatellites.

Mobile genetic elements (transposons) and their relics Edit

Transposable genetic elements, DNA sequences that can replicate and insert copies of themselves at other locations within a host genome, are an abundant component in the human genome. The most abundant transposon lineage, Alu, has about 50,000 active copies, [75] and can be inserted into intragenic and intergenic regions. [76] One other lineage, LINE-1, has about 100 active copies per genome (the number varies between people). [77] Together with non-functional relics of old transposons, they account for over half of total human DNA. [78] Sometimes called "jumping genes", transposons have played a major role in sculpting the human genome. Some of these sequences represent endogenous retroviruses, DNA copies of viral sequences that have become permanently integrated into the genome and are now passed on to succeeding generations.

Mobile elements within the human genome can be classified into LTR retrotransposons (8.3% of total genome), SINEs (13.1% of total genome) including Alu elements, LINEs (20.4% of total genome), SVAs and Class II DNA transposons (2.9% of total genome).

Human reference genome Edit

With the exception of identical twins, all humans show significant variation in genomic DNA sequences. The human reference genome (HRG) is used as a standard sequence reference.

There are several important points concerning the human reference genome:

  • The HRG is a haploid sequence. Each chromosome is represented once.
  • The HRG is a composite sequence, and does not correspond to any actual human individual.
  • The HRG is periodically updated to correct errors, ambiguities, and unknown "gaps".
  • The HRG in no way represents an "ideal" or "perfect" human individual. It is simply a standardized representation or model that is used for comparative purposes.

The Genome Reference Consortium is responsible for updating the HRG. Version 38 was released in December 2013. [79]

Measuring human genetic variation Edit

Most studies of human genetic variation have focused on single-nucleotide polymorphisms (SNPs), which are substitutions in individual bases along a chromosome. Most analyses estimate that SNPs occur 1 in 1000 base pairs, on average, in the euchromatic human genome, although they do not occur at a uniform density. Thus follows the popular statement that "we are all, regardless of race, genetically 99.9% the same", [80] although this would be somewhat qualified by most geneticists. For example, a much larger fraction of the genome is now thought to be involved in copy number variation. [81] A large-scale collaborative effort to catalog SNP variations in the human genome is being undertaken by the International HapMap Project.

The genomic loci and length of certain types of small repetitive sequences are highly variable from person to person, which is the basis of DNA fingerprinting and DNA paternity testing technologies. The heterochromatic portions of the human genome, which total several hundred million base pairs, are also thought to be quite variable within the human population (they are so repetitive and so long that they cannot be accurately sequenced with current technology). These regions contain few genes, and it is unclear whether any significant phenotypic effect results from typical variation in repeats or heterochromatin.

Most gross genomic mutations in gamete germ cells probably result in inviable embryos however, a number of human diseases are related to large-scale genomic abnormalities. Down syndrome, Turner Syndrome, and a number of other diseases result from nondisjunction of entire chromosomes. Cancer cells frequently have aneuploidy of chromosomes and chromosome arms, although a cause and effect relationship between aneuploidy and cancer has not been established.

Mapping human genomic variation Edit

Whereas a genome sequence lists the order of every DNA base in a genome, a genome map identifies the landmarks. A genome map is less detailed than a genome sequence and aids in navigating around the genome. [82] [83]

An example of a variation map is the HapMap being developed by the International HapMap Project. The HapMap is a haplotype map of the human genome, "which will describe the common patterns of human DNA sequence variation." [84] It catalogs the patterns of small-scale variations in the genome that involve single DNA letters, or bases.

Researchers published the first sequence-based map of large-scale structural variation across the human genome in the journal Nature in May 2008. [85] [86] Large-scale structural variations are differences in the genome among people that range from a few thousand to a few million DNA bases some are gains or losses of stretches of genome sequence and others appear as re-arrangements of stretches of sequence. These variations include differences in the number of copies individuals have of a particular gene, deletions, translocations and inversions.

Structural variation Edit

Structural variation refers to genetic variants that affect larger segments of the human genome, as opposed to point mutations. Often, structural variants (SVs) are defined as variants of 50 base pairs (bp) or greater, such as deletions, duplications, insertions, inversions and other rearrangements. About 90% of structural variants are noncoding deletions but most individuals have more than a thousand such deletions the size of deletions ranges from dozens of base pairs to tens of thousands of bp. [87] On average, individuals carry

3 rare structural variants that alter coding regions, e.g. delete exons. About 2% of individuals carry ultra-rare megabase-scale structural variants, especially rearrangements. That is, millions of base pairs may be inverted within a chromosome ultra-rare means that they are only found in individuals or their family members and thus have arisen very recently. [87]

SNP frequency across the human genome Edit

Single-nucleotide polymorphisms (SNPs) do not occur homogeneously across the human genome. In fact, there is enormous diversity in SNP frequency between genes, reflecting different selective pressures on each gene as well as different mutation and recombination rates across the genome. However, studies on SNPs are biased towards coding regions, the data generated from them are unlikely to reflect the overall distribution of SNPs throughout the genome. Therefore, the SNP Consortium protocol was designed to identify SNPs with no bias towards coding regions and the Consortium's 100,000 SNPs generally reflect sequence diversity across the human chromosomes. The SNP Consortium aims to expand the number of SNPs identified across the genome to 300 000 by the end of the first quarter of 2001. [88]

Changes in non-coding sequence and synonymous changes in coding sequence are generally more common than non-synonymous changes, reflecting greater selective pressure reducing diversity at positions dictating amino acid identity. Transitional changes are more common than transversions, with CpG dinucleotides showing the highest mutation rate, presumably due to deamination.

Personal genomes Edit

A personal genome sequence is a (nearly) complete sequence of the chemical base pairs that make up the DNA of a single person. Because medical treatments have different effects on different people due to genetic variations such as single-nucleotide polymorphisms (SNPs), the analysis of personal genomes may lead to personalized medical treatment based on individual genotypes. [89]

The first personal genome sequence to be determined was that of Craig Venter in 2007. Personal genomes had not been sequenced in the public Human Genome Project to protect the identity of volunteers who provided DNA samples. That sequence was derived from the DNA of several volunteers from a diverse population. [90] However, early in the Venter-led Celera Genomics genome sequencing effort the decision was made to switch from sequencing a composite sample to using DNA from a single individual, later revealed to have been Venter himself. Thus the Celera human genome sequence released in 2000 was largely that of one man. Subsequent replacement of the early composite-derived data and determination of the diploid sequence, representing both sets of chromosomes, rather than a haploid sequence originally reported, allowed the release of the first personal genome. [91] In April 2008, that of James Watson was also completed. In 2009, Stephen Quake published his own genome sequence derived from a sequencer of his own design, the Heliscope. [92] A Stanford team led by Euan Ashley published a framework for the medical interpretation of human genomes implemented on Quake’s genome and made whole genome-informed medical decisions for the first time. [93] That team further extended the approach to the West family, the first family sequenced as part of Illumina’s Personal Genome Sequencing program. [94] Since then hundreds of personal genome sequences have been released, [95] including those of Desmond Tutu, [96] [97] and of a Paleo-Eskimo. [98] In 2012, the whole genome sequences of two family trios among 1092 genomes was made public. [3] In November 2013, a Spanish family made four personal exome datasets (about 1% of the genome) publicly available under a Creative Commons public domain license. [99] [100] The Personal Genome Project (started in 2005) is among the few to make both genome sequences and corresponding medical phenotypes publicly available. [101] [102]

The sequencing of individual genomes further unveiled levels of genetic complexity that had not been appreciated before. Personal genomics helped reveal the significant level of diversity in the human genome attributed not only to SNPs but structural variations as well. However, the application of such knowledge to the treatment of disease and in the medical field is only in its very beginnings. [103] Exome sequencing has become increasingly popular as a tool to aid in diagnosis of genetic disease because the exome contributes only 1% of the genomic sequence but accounts for roughly 85% of mutations that contribute significantly to disease. [104]

Human knockouts Edit

In humans, gene knockouts naturally occur as heterozygous or homozygous loss-of-function gene knockouts. These knockouts are often difficult to distinguish, especially within heterogeneous genetic backgrounds. They are also difficult to find as they occur in low frequencies.

Populations with high rates of consanguinity, such as countries with high rates of first-cousin marriages, display the highest frequencies of homozygous gene knockouts. Such populations include Pakistan, Iceland, and Amish populations. These populations with a high level of parental-relatedness have been subjects of human knock out research which has helped to determine the function of specific genes in humans. By distinguishing specific knockouts, researchers are able to use phenotypic analyses of these individuals to help characterize the gene that has been knocked out.

Knockouts in specific genes can cause genetic diseases, potentially have beneficial effects, or even result in no phenotypic effect at all. However, determining a knockout's phenotypic effect and in humans can be challenging. Challenges to characterizing and clinically interpreting knockouts include difficulty calling of DNA variants, determining disruption of protein function (annotation), and considering the amount of influence mosaicism has on the phenotype. [105]

One major study that investigated human knockouts is the Pakistan Risk of Myocardial Infarction study. It was found that individuals possessing a heterozygous loss-of-function gene knockout for the APOC3 gene had lower triglycerides in the blood after consuming a high fat meal as compared to individuals without the mutation. However, individuals possessing homozygous loss-of-function gene knockouts of the APOC3 gene displayed the lowest level of triglycerides in the blood after the fat load test, as they produce no functional APOC3 protein. [106]

Most aspects of human biology involve both genetic (inherited) and non-genetic (environmental) factors. Some inherited variation influences aspects of our biology that are not medical in nature (height, eye color, ability to taste or smell certain compounds, etc.). Moreover, some genetic disorders only cause disease in combination with the appropriate environmental factors (such as diet). With these caveats, genetic disorders may be described as clinically defined diseases caused by genomic DNA sequence variation. In the most straightforward cases, the disorder can be associated with variation in a single gene. For example, cystic fibrosis is caused by mutations in the CFTR gene and is the most common recessive disorder in caucasian populations with over 1,300 different mutations known. [107]

Disease-causing mutations in specific genes are usually severe in terms of gene function and are fortunately rare, thus genetic disorders are similarly individually rare. However, since there are many genes that can vary to cause genetic disorders, in aggregate they constitute a significant component of known medical conditions, especially in pediatric medicine. Molecularly characterized genetic disorders are those for which the underlying causal gene has been identified. Currently there are approximately 2,200 such disorders annotated in the OMIM database. [107]

Studies of genetic disorders are often performed by means of family-based studies. In some instances, population based approaches are employed, particularly in the case of so-called founder populations such as those in Finland, French-Canada, Utah, Sardinia, etc. Diagnosis and treatment of genetic disorders are usually performed by a geneticist-physician trained in clinical/medical genetics. The results of the Human Genome Project are likely to provide increased availability of genetic testing for gene-related disorders, and eventually improved treatment. Parents can be screened for hereditary conditions and counselled on the consequences, the probability of inheritance, and how to avoid or ameliorate it in their offspring.

There are many different kinds of DNA sequence variation, ranging from complete extra or missing chromosomes down to single nucleotide changes. It is generally presumed that much naturally occurring genetic variation in human populations is phenotypically neutral, i.e., has little or no detectable effect on the physiology of the individual (although there may be fractional differences in fitness defined over evolutionary time frames). Genetic disorders can be caused by any or all known types of sequence variation. To molecularly characterize a new genetic disorder, it is necessary to establish a causal link between a particular genomic sequence variant and the clinical disease under investigation. Such studies constitute the realm of human molecular genetics.

With the advent of the Human Genome and International HapMap Project, it has become feasible to explore subtle genetic influences on many common disease conditions such as diabetes, asthma, migraine, schizophrenia, etc. Although some causal links have been made between genomic sequence variants in particular genes and some of these diseases, often with much publicity in the general media, these are usually not considered to be genetic disorders per se as their causes are complex, involving many different genetic and environmental factors. Thus there may be disagreement in particular cases whether a specific medical condition should be termed a genetic disorder.

Additional genetic disorders of mention are Kallman syndrome and Pfeiffer syndrome (gene FGFR1), Fuchs corneal dystrophy (gene TCF4), Hirschsprung's disease (genes RET and FECH), Bardet-Biedl syndrome 1 (genes CCDC28B and BBS1), Bardet-Biedl syndrome 10 (gene BBS10), and facioscapulohumeral muscular dystrophy type 2 (genes D4Z4 and SMCHD1). [108]

Genome sequencing is now able to narrow the genome down to specific locations to more accurately find mutations that will result in a genetic disorder. Copy number variants (CNVs) and single nucleotide variants (SNVs) are also able to be detected at the same time as genome sequencing with newer sequencing procedures available, called Next Generation Sequencing (NGS). This only analyzes a small portion of the genome, around 1-2%. The results of this sequencing can be used for clinical diagnosis of a genetic condition, including Usher syndrome, retinal disease, hearing impairments, diabetes, epilepsy, Leigh disease, hereditary cancers, neuromuscular diseases, primary immunodeficiencies, severe combined immunodeficiency (SCID), and diseases of the mitochondria. [109] NGS can also be used to identify carriers of diseases before conception. The diseases that can be detected in this sequencing include Tay-Sachs disease, Bloom syndrome, Gaucher disease, Canavan disease, familial dysautonomia, cystic fibrosis, spinal muscular atrophy, and fragile-X syndrome. The Next Genome Sequencing can be narrowed down to specifically look for diseases more prevalent in certain ethnic populations. [110]

1:15000 in American Caucasians

1:176 in Mennonite/Amish communities

Comparative genomics studies of mammalian genomes suggest that approximately 5% of the human genome has been conserved by evolution since the divergence of extant lineages approximately 200 million years ago, containing the vast majority of genes. [111] [112] The published chimpanzee genome differs from that of the human genome by 1.23% in direct sequence comparisons. [113] Around 20% of this figure is accounted for by variation within each species, leaving only

1.06% consistent sequence divergence between humans and chimps at shared genes. [114] This nucleotide by nucleotide difference is dwarfed, however, by the portion of each genome that is not shared, including around 6% of functional genes that are unique to either humans or chimps. [115]

In other words, the considerable observable differences between humans and chimps may be due as much or more to genome level variation in the number, function and expression of genes rather than DNA sequence changes in shared genes. Indeed, even within humans, there has been found to be a previously unappreciated amount of copy number variation (CNV) which can make up as much as 5 – 15% of the human genome. In other words, between humans, there could be +/- 500,000,000 base pairs of DNA, some being active genes, others inactivated, or active at different levels. The full significance of this finding remains to be seen. On average, a typical human protein-coding gene differs from its chimpanzee ortholog by only two amino acid substitutions nearly one third of human genes have exactly the same protein translation as their chimpanzee orthologs. A major difference between the two genomes is human chromosome 2, which is equivalent to a fusion product of chimpanzee chromosomes 12 and 13. [116] (later renamed to chromosomes 2A and 2B, respectively).

Humans have undergone an extraordinary loss of olfactory receptor genes during our recent evolution, which explains our relatively crude sense of smell compared to most other mammals. Evolutionary evidence suggests that the emergence of color vision in humans and several other primate species has diminished the need for the sense of smell. [117]

In September 2016, scientists reported that, based on human DNA genetic studies, all non-Africans in the world today can be traced to a single population that exited Africa between 50,000 and 80,000 years ago. [118]

The human mitochondrial DNA is of tremendous interest to geneticists, since it undoubtedly plays a role in mitochondrial disease. It also sheds light on human evolution for example, analysis of variation in the human mitochondrial genome has led to the postulation of a recent common ancestor for all humans on the maternal line of descent (see Mitochondrial Eve).

Due to the lack of a system for checking for copying errors, [119] mitochondrial DNA (mtDNA) has a more rapid rate of variation than nuclear DNA. This 20-fold higher mutation rate allows mtDNA to be used for more accurate tracing of maternal ancestry. [ citation needed ] Studies of mtDNA in populations have allowed ancient migration paths to be traced, such as the migration of Native Americans from Siberia [120] or Polynesians from southeastern Asia. [ citation needed ] It has also been used to show that there is no trace of Neanderthal DNA in the European gene mixture inherited through purely maternal lineage. [121] Due to the restrictive all or none manner of mtDNA inheritance, this result (no trace of Neanderthal mtDNA) would be likely unless there were a large percentage of Neanderthal ancestry, or there was strong positive selection for that mtDNA. For example, going back 5 generations, only 1 of a person's 32 ancestors contributed to that person's mtDNA, so if one of these 32 was pure Neanderthal an expected

3% of that person's autosomal DNA would be of Neanderthal origin, yet they would have a

97% chance of having no trace of Neanderthal mtDNA. [ citation needed ]

Epigenetics describes a variety of features of the human genome that transcend its primary DNA sequence, such as chromatin packaging, histone modifications and DNA methylation, and which are important in regulating gene expression, genome replication and other cellular processes. Epigenetic markers strengthen and weaken transcription of certain genes but do not affect the actual sequence of DNA nucleotides. DNA methylation is a major form of epigenetic control over gene expression and one of the most highly studied topics in epigenetics. During development, the human DNA methylation profile experiences dramatic changes. In early germ line cells, the genome has very low methylation levels. These low levels generally describe active genes. As development progresses, parental imprinting tags lead to increased methylation activity. [122] [123]

Epigenetic patterns can be identified between tissues within an individual as well as between individuals themselves. Identical genes that have differences only in their epigenetic state are called epialleles. Epialleles can be placed into three categories: those directly determined by an individual's genotype, those influenced by genotype, and those entirely independent of genotype. The epigenome is also influenced significantly by environmental factors. Diet, toxins, and hormones impact the epigenetic state. Studies in dietary manipulation have demonstrated that methyl-deficient diets are associated with hypomethylation of the epigenome. Such studies establish epigenetics as an important interface between the environment and the genome. [124]

  1. ^"GRCh38.p13". ncbi. Genome Reference Consortium . Retrieved 8 June 2020 .
  2. ^
  3. Brown TA (2002). The Human Genome (2nd ed.). Oxford: Wiley-Liss.
  4. ^ ab
  5. Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA (November 2012). "An integrated map of genetic variation from 1,092 human genomes". Nature. 491 (7422): 56–65. Bibcode:2012Natur.491. 56T. doi:10.1038/nature11632. PMC3498066 . PMID23128226.
  6. ^
  7. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, et al. (October 2015). "A global reference for human genetic variation". Nature. 526 (7571): 68–74. Bibcode:2015Natur.526. 68T. doi:10.1038/nature15393. PMC4750478 . PMID26432245.
  8. ^
  9. Chimpanzee Sequencing Analysis Consortium (2005). "Initial sequence of the chimpanzee genome and comparison with the human genome" (PDF) . Nature. 437 (7055): 69–87. Bibcode:2005Natur.437. 69.. doi: 10.1038/nature04072 . PMID16136131. S2CID2638825.
  10. ^
  11. Varki A, Altheide TK (December 2005). "Comparing the human and chimpanzee genomes: searching for needles in a haystack". Genome Research. 15 (12): 1746–58. doi: 10.1101/gr.3737405 . PMID16339373.
  12. ^
  13. Wade N (23 September 1999). "Number of Human Genes Is Put at 140,000, a Significant Gain". The New York Times.
  14. ^ ab
  15. International Human Genome Sequencing Consortium (October 2004). "Finishing the euchromatic sequence of the human genome". Nature. 431 (7011): 931–45. Bibcode:2004Natur.431..931H. doi: 10.1038/nature03001 . PMID15496913.
  16. ^
  17. Ezkurdia I, Juan D, Rodriguez JM, Frankish A, Diekhans M, Harrow J, Vazquez J, Valencia A, Tress ML (November 2014). "Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes". Human Molecular Genetics. 23 (22): 5866–78. doi:10.1093/hmg/ddu309. PMC4204768 . PMID24939910.
  18. ^
  19. Saey TH (17 September 2018). "A recount of human genes ups the number to at least 46,831". Science News.
  20. ^
  21. Alles J, Fehlmann T, Fischer U, Backes C, Galata V, Minet M, et al. (April 2019). "An estimate of the total number of true human miRNAs". Nucleic Acids Research. 47 (7): 3353–3364. doi:10.1093/nar/gkz097. PMC6468295 . PMID30820533.
  22. ^ abc
  23. Pennisi E (September 2012). "Genomics. ENCODE project writes eulogy for junk DNA". Science. 337 (6099): 1159–1161. doi:10.1126/science.337.6099.1159. PMID22955811.
  24. ^
  25. Zhang S (28 November 2018). "300 Million Letters of DNA Are Missing From the Human Genome". The Atlantic.
  26. ^ abc
  27. International Human Genome Sequencing Consortium (February 2001). "Initial sequencing and analysis of the human genome". Nature. 409 (6822): 860–921. Bibcode:2001Natur.409..860L. doi: 10.1038/35057062 . PMID11237011.
  28. ^International Human Genome Sequencing Consortium Publishes Sequence and Analysis of the Human Genome
  29. ^
  30. Pennisi E (February 2001). "The human genome". Science. 291 (5507): 1177–80. doi:10.1126/science.291.5507.1177. PMID11233420. S2CID38355565.
  31. ^
  32. Molteni M (19 November 2018). "Now You Can Sequence Your Whole Genome For Just $200". Wired.
  33. ^
  34. Wrighton K (February 2021). "Filling in the gaps telomere to telomere". Nature Milestones: Genomic Sequencing: S21.
  35. ^
  36. Pollack A (2 June 2016). "Scientists Announce HGP-Write, Project to Synthesize the Human Genome". New York Times . Retrieved 2 June 2016 .
  37. ^
  38. Boeke JD, Church G, Hessel A, Kelley NJ, Arkin A, Cai Y, et al. (July 2016). "The Genome Project-Write". Science. 353 (6295): 126–7. Bibcode:2016Sci. 353..126B. doi:10.1126/science.aaf6850. PMID27256881. S2CID206649424.
  39. ^
  40. Zhang S (28 November 2018). "300 Million Letters of DNA Are Missing From the Human Genome". The Atlantic . Retrieved 16 August 2019 .
  41. ^
  42. Chaisson MJ, Huddleston J, Dennis MY, Sudmant PH, Malig M, Hormozdiari F, et al. (January 2015). "Resolving the complexity of the human genome using single-molecule sequencing". Nature. 517 (7536): 608–11. Bibcode:2015Natur.517..608C. doi:10.1038/nature13907. PMC4317254 . PMID25383537.
  43. ^
  44. Miga KH, Koren S, Rhie A, Vollger MR, Gershman A, Bzikadze A, et al. (September 2020). "Telomere-to-telomere assembly of a complete human X chromosome". Nature. 585 (7823): 79–84. Bibcode:2020Natur.585. 79M. doi:10.1038/s41586-020-2547-7. PMC7484160 . PMID32663838.
  45. ^Ensembl genome browser release 87 [permanent dead link] (December 2016) for most values Ensembl genome browser release 68 (July 2012) for miRNA, rRNA, snRNA, snoRNA.
  46. ^
  47. Piovesan A, Pelleri MC, Antonaros F, Strippoli P, Caracausi M, Vitale L (February 2019). "On the length, weight and GC content of the human genome". BMC Research Notes. 12 (1): 106. doi:10.1186/s13104-019-4137-z. PMC6391780 . PMID30813969.
  48. ^
  49. Salzberg SL (August 2018). "Open questions: How many genes do we have?". BMC Biology. 16 (1): 94. doi:10.1186/s12915-018-0564-x. PMC6100717 . PMID30124169.
  50. ^
  51. "Gencode statistics, version 28". Archived from the original on 2 March 2018 . Retrieved 12 July 2018 .
  52. ^
  53. "Ensembl statistics for version 92.38, corresponding to Gencode v28" . Retrieved 12 July 2018 .
  54. ^
  55. "NCBI Homo sapiens Annotation Release 108". NIH. 2016.
  56. ^
  57. "CHESS statistics, version 2.0". Center for Computational Biology. Johns Hopkins University.
  58. ^
  59. "Human Genome Project Completion: Frequently Asked Questions". National Human Genome Research Institute (NHGRI) . Retrieved 2 February 2019 .
  60. ^
  61. Christley S, Lu Y, Li C, Xie X (January 2009). "Human genomes as email attachments". Bioinformatics. 25 (2): 274–5. doi: 10.1093/bioinformatics/btn582 . PMID18996942.
  62. ^
  63. Liu Z, Venkatesh SS, Maley CC (October 2008). "Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples". BMC Genomics. 9: 509. doi:10.1186/1471-2164-9-509. PMC2628393 . PMID18973670. , fig. 6, using the Lempel-Ziv estimators of entropy rate.
  64. ^
  65. Waters K (7 March 2007). "Molecular Genetics". Stanford Encyclopedia of Philosophy . Retrieved 18 July 2013 .
  66. ^
  67. Gannett L (26 October 2008). "The Human Genome Project". Stanford Encyclopedia of Philosophy . Retrieved 18 July 2013 .
  68. ^PANTHER Pie Chart at the PANTHER Classification System homepage. Retrieved 25 May 2011
  69. ^List of human proteins in the Uniprot Human reference proteome accessed 28 January 2015
  70. ^
  71. Kauffman SA (March 1969). "Metabolic stability and epigenesis in randomly constructed genetic nets". Journal of Theoretical Biology. 22 (3): 437–67. doi:10.1016/0022-5193(69)90015-0. PMID5803332.
  72. ^
  73. Ohno S (1972). "An argument for the genetic simplicity of man and other mammals". Journal of Human Evolution. 1 (6): 651–662. doi:10.1016/0047-2484(72)90011-5.
  74. ^
  75. Sémon M, Mouchiroud D, Duret L (February 2005). "Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance". Human Molecular Genetics. 14 (3): 421–7. doi: 10.1093/hmg/ddi038 . PMID15590696.
  76. ^ M. Huang, H. Zhu, B. Shen, G. Gao, "A non-random gait through the human genome", 3rd International Conference on Bioinformatics and Biomedical Engineering (UCBBE, 2009), 1–3
  77. ^ ab
  78. Bang ML, Centner T, Fornoff F, Geach AJ, Gotthardt M, McNabb M, Witt CC, Labeit D, Gregorio CC, Granzier H, Labeit S (2001). "The complete gene sequence of titin, expression of an unusual approximately 700-kDa titin isoform, and its interaction with obscurin identify a novel Z-line to I-band linking system". Circulation Research. 89 (11): 1065–72. doi: 10.1161/hh2301.100981 . PMID11717165.
  79. ^ ab
  80. Piovesan A, Caracausi M, Antonaros F, Pelleri MC, Vitale L (2016). "GeneBase 1.1: a tool to summarize data from NCBI gene datasets and its application to an update of human gene statistics". Database: The Journal of Biological Databases and Curation. 2016: baw153. doi:10.1093/database/baw153. PMC5199132 . PMID28025344.
  81. ^Ensembl genome browser (July 2012)
  82. ^
  83. Gregory TR (September 2005). "Synergy between sequence and size in large-scale genomics". Nature Reviews Genetics. 6 (9): 699–708. doi:10.1038/nrg1674. PMID16151375. S2CID24237594.
  84. ^ ab
  85. Palazzo AF, Akef A (June 2012). "Nuclear export as a key arbiter of "mRNA identity" in eukaryotes". Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms. 1819 (6): 566–77. doi:10.1016/j.bbagrm.2011.12.012. PMID22248619.
  86. ^
  87. Ludwig MZ (December 2002). "Functional evolution of noncoding DNA". Current Opinion in Genetics & Development. 12 (6): 634–9. doi:10.1016/S0959-437X(02)00355-6. PMID12433575.
  88. ^
  89. Martens JA, Laprade L, Winston F (June 2004). "Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene". Nature. 429 (6991): 571–4. Bibcode:2004Natur.429..571M. doi:10.1038/nature02538. PMID15175754. S2CID809550.
  90. ^
  91. Tsai MC, Manor O, Wan Y, Mosammaparast N, Wang JK, Lan F, Shi Y, Segal E, Chang HY (August 2010). "Long noncoding RNA as modular scaffold of histone modification complexes". Science. 329 (5992): 689–93. Bibcode:2010Sci. 329..689T. doi:10.1126/science.1192002. PMC2967777 . PMID20616235.
  92. ^
  93. Bartolomei MS, Zemel S, Tilghman SM (May 1991). "Parental imprinting of the mouse H19 gene". Nature. 351 (6322): 153–5. Bibcode:1991Natur.351..153B. doi:10.1038/351153a0. PMID1709450. S2CID4364975.
  94. ^
  95. Kobayashi T, Ganley AR (September 2005). "Recombination regulation by transcription-induced cohesin dissociation in rDNA repeats". Science. 309 (5740): 1581–4. Bibcode:2005Sci. 309.1581K. doi:10.1126/science.1116102. PMID16141077. S2CID21547462.
  96. ^
  97. Salmena L, Poliseno L, Tay Y, Kats L, Pandolfi PP (August 2011). "A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language?". Cell. 146 (3): 353–8. doi:10.1016/j.cell.2011.07.014. PMC3235919 . PMID21802130.
  98. ^
  99. Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu XJ, Harte R, Balasubramanian S, Tanzer A, Diekhans M, Reymond A, Hubbard TJ, Harrow J, Gerstein MB (2012). "The GENCODE pseudogene resource". Genome Biology. 13 (9): R51. doi:10.1186/gb-2012-13-9-r51. PMC3491395 . PMID22951037.
  100. ^
  101. Gilad Y, Man O, Pääbo S, Lancet D (March 2003). "Human specific loss of olfactory receptor genes". Proceedings of the National Academy of Sciences of the United States of America. 100 (6): 3324–7. Bibcode:2003PNAS..100.3324G. doi:10.1073/pnas.0535697100. PMC152291 . PMID12612342.
  102. ^
  103. Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y, Barrette TR, Prensner JR, Evans JR, Zhao S, Poliakov A, Cao X, Dhanasekaran SM, Wu YM, Robinson DR, Beer DG, Feng FY, Iyer HK, Chinnaiyan AM (March 2015). "The landscape of long noncoding RNAs in the human transcriptome". Nature Genetics. 47 (3): 199–208. doi:10.1038/ng.3192. PMC4417758 . PMID25599403.
  104. ^
  105. Eddy SR (December 2001). "Non-coding RNA genes and the modern RNA world". Nature Reviews Genetics. 2 (12): 919–29. doi:10.1038/35103511. PMID11733745. S2CID18347629.
  106. ^
  107. Managadze D, Lobkovsky AE, Wolf YI, Shabalina SA, Rogozin IB, Koonin EV (2013). "The vast, conserved mammalian lincRNome". PLOS Computational Biology. 9 (2): e1002917. Bibcode:2013PLSCB. 9E2917M. doi:10.1371/journal.pcbi.1002917. PMC3585383 . PMID23468607.
  108. ^
  109. Palazzo AF, Lee ES (2015). "Non-coding RNA: what is functional and what is junk?". Frontiers in Genetics. 6: 2. doi:10.3389/fgene.2015.00002. PMC4306305 . PMID25674102.
  110. ^
  111. Mattick JS, Makunin IV (April 2006). "Non-coding RNA". Human Molecular Genetics. 15 Spec No 1: R17–29. doi: 10.1093/hmg/ddl046 . PMID16651366.
  112. ^ ab
  113. Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M (September 2012). "An integrated encyclopedia of DNA elements in the human genome". Nature. 489 (7414): 57–74. Bibcode:2012Natur.489. 57T. doi:10.1038/nature11247. PMC3439153 . PMID22955616.
  114. ^
  115. Birney E (5 September 2012). "ENCODE: My own thoughts". Ewan's Blog: Bioinformatician at large.
  116. ^
  117. Stamatoyannopoulos JA (September 2012). "What does our genome encode?". Genome Research. 22 (9): 1602–11. doi:10.1101/gr.146506.112. PMC3431477 . PMID22955972.
  118. ^
  119. Carroll SB, Gompel N, Prudhomme B (May 2008). "Regulating Evolution". Scientific American. 298 (5): 60–67. Bibcode:2008SciAm.298e..60C. doi:10.1038/scientificamerican0508-60. PMID18444326.
  120. ^
  121. Miller JH, Ippen K, Scaife JG, Beckwith JR (1968). "The promoter-operator region of the lac operon of Escherichia coli". J. Mol. Biol. 38 (3): 413–20. doi:10.1016/0022-2836(68)90395-1. PMID4887877.
  122. ^
  123. Wright S, Rosenthal A, Flavell R, Grosveld F (1984). "DNA sequences required for regulated expression of beta-globin genes in murine erythroleukemia cells". Cell. 38 (1): 265–73. doi:10.1016/0092-8674(84)90548-8. PMID6088069. S2CID34587386.
  124. ^
  125. Nei M, Xu P, Glazko G (February 2001). "Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms". Proceedings of the National Academy of Sciences of the United States of America. 98 (5): 2497–502. Bibcode:2001PNAS. 98.2497N. doi:10.1073/pnas.051611498. PMC30166 . PMID11226267.
  126. ^
  127. Loots GG, Locksley RM, Blankespoor CM, Wang ZE, Miller W, Rubin EM, Frazer KA (April 2000). "Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons". Science. 288 (5463): 136–40. Bibcode:2000Sci. 288..136L. doi:10.1126/science.288.5463.136. PMID10753117. Summary
  128. ^
  129. Meunier M. "Genoscope and Whitehead announce a high sequence coverage of the Tetraodon nigroviridis genome". Genoscope. Archived from the original on 16 October 2006 . Retrieved 12 September 2006 .
  130. ^
  131. Romero IG, Ruvinsky I, Gilad Y (July 2012). "Comparative studies of gene expression and the evolution of gene regulation". Nature Reviews Genetics. 13 (7): 505–16. doi:10.1038/nrg3229. PMC4034676 . PMID22705669.
  132. ^
  133. Schmidt D, Wilson MD, Ballester B, Schwalie PC, Brown GD, Marshall A, Kutter C, Watt S, Martinez-Jimenez CP, Mackay S, Talianidis I, Flicek P, Odom DT (May 2010). "Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding". Science. 328 (5981): 1036–40. Bibcode:2010Sci. 328.1036S. doi:10.1126/science.1186176. PMC3008766 . PMID20378774.
  134. ^
  135. Wilson MD, Barbosa-Morais NL, Schmidt D, Conboy CM, Vanes L, Tybulewicz VL, Fisher EM, Tavaré S, Odom DT (October 2008). "Species-specific transcription in mice carrying human chromosome 21". Science. 322 (5900): 434–8. Bibcode:2008Sci. 322..434W. doi:10.1126/science.1160930. PMC3717767 . PMID18787134.
  136. ^
  137. Treangen TJ, Salzberg SL (January 2012). "Repetitive DNA and next-generation sequencing: computational challenges and solutions". Nature Reviews Genetics. 13 (1): 36–46. doi:10.1038/nrg3117. PMC3324860 . PMID22124482.
  138. ^
  139. Duitama J, Zablotskaya A, Gemayel R, Jansen A, Belet S, Vermeesch JR, Verstrepen KJ, Froyen G (May 2014). "Large-scale analysis of tandem repeat variability in the human genome". Nucleic Acids Research. 42 (9): 5728–41. doi:10.1093/nar/gku212. PMC4027155 . PMID24682812.
  140. ^
  141. Pierce BA (2012). Genetics : a conceptual approach (4th ed.). New York: W.H. Freeman. pp. 538–540. ISBN978-1-4292-3250-0 .
  142. ^
  143. Bennett EA, Keller H, Mills RE, Schmidt S, Moran JV, Weichenrieder O, Devine SE (December 2008). "Active Alu retrotransposons in the human genome". Genome Research. 18 (12): 1875–83. doi:10.1101/gr.081737.108. PMC2593586 . PMID18836035.
  144. ^
  145. Liang KH, Yeh CT (2013). "A gene expression restriction network mediated by sense and antisense Alu sequences located on protein-coding messenger RNAs". BMC Genomics. 14: 325. doi:10.1186/1471-2164-14-325. PMC3655826 . PMID23663499.
  146. ^
  147. Brouha B, Schustak J, Badge RM, Lutz-Prigge S, Farley AH, Moran JV, Kazazian HH (April 2003). "Hot L1s account for the bulk of retrotransposition in the human population". Proceedings of the National Academy of Sciences of the United States of America. 100 (9): 5280–5. Bibcode:2003PNAS..100.5280B. doi:10.1073/pnas.0831042100. PMC154336 . PMID12682288.
  148. ^
  149. Barton NH, Briggs DE, Eisen JA, Goldstein DB, Patel NH (2007). Evolution. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. ISBN978-0-87969-684-9 .
  150. ^
  151. NCBI. "GRCh38 – hg38 – Genome – Assembly – NCBI". . Retrieved 15 March 2019 .
  152. ^
  153. "from Bill Clinton's 2000 State of the Union address". Archived from the original on 21 February 2017 . Retrieved 14 June 2007 .
  154. ^
  155. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. (November 2006). "Global variation in copy number in the human genome". Nature. 444 (7118): 444–54. Bibcode:2006Natur.444..444R. doi:10.1038/nature05329. PMC2669898 . PMID17122850.
  156. ^
  157. "What's a Genome?". 15 January 2003 . Retrieved 31 May 2009 .
  158. ^
  159. NCBI_user_services (29 March 2004). "Mapping Factsheet". Archived from the original on 19 July 2010 . Retrieved 31 May 2009 .
  160. ^
  161. "About the Project". HapMap . Retrieved 31 May 2009 .
  162. ^
  163. "2008 Release: Researchers Produce First Sequence Map of Large-Scale Structural Variation in the Human Genome". . Retrieved 31 May 2009 .
  164. ^
  165. Kidd JM, Cooper GM, Donahue WF, Hayden HS, Sampas N, Graves T, et al. (May 2008). "Mapping and sequencing of structural variation from eight human genomes". Nature. 453 (7191): 56–64. Bibcode:2008Natur.453. 56K. doi:10.1038/nature06862. PMC2424287 . PMID18451855.
  166. ^ ab
  167. Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, et al. (July 2020). "Mapping and characterization of structural variation in 17,795 human genomes". Nature. 583 (7814): 83–89. doi:10.1038/s41586-020-2371-0. PMC7547914 . PMID32460305.
  168. ^
  169. Gray IC, Campbell DA, Spurr NK (2000). "Single nucleotide polymorphisms as tools in human genetics". Human Molecular Genetics. 9 (16): 2403–2408. doi: 10.1093/hmg/9.16.2403 . PMID11005795.
  170. ^
  171. Lai E (June 2001). "Application of SNP technologies in medicine: lessons learned and future challenges". Genome Research. 11 (6): 927–9. doi: 10.1101/gr.192301 . PMID11381021.
  172. ^
  173. "Human Genome Project Completion: Frequently Asked Questions". . Retrieved 31 May 2009 .
  174. ^
  175. Singer E (4 September 2007). "Craig Venter's Genome". MIT Technology Review . Retrieved 25 May 2010 .
  176. ^
  177. Pushkarev D, Neff NF, Quake SR (September 2009). "Single-molecule sequencing of an individual human genome". Nature Biotechnology. 27 (9): 847–50. doi:10.1038/nbt.1561. PMC4117198 . PMID19668243.
  178. ^
  179. Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, Dewey FE, et al. (May 2010). "Clinical assessment incorporating a personal genome". Lancet. 375 (9725): 1525–35. doi:10.1016/S0140-6736(10)60452-7. PMC2937184 . PMID20435227.
  180. ^
  181. Dewey FE, Chen R, Cordero SP, Ormond KE, Caleshu C, Karczewski KJ, et al. (September 2011). "Phased whole-genome genetic risk in a family quartet using a major allele reference sequence". PLOS Genetics. 7 (9): e1002280. doi: 10.1371/journal.pgen.1002280 . PMC3174201 . PMID21935354.
  182. ^
  183. "Complete Genomics Adds 29 High-Coverage, Complete Human Genome Sequencing Datasets to Its Public Genomic Repository".
  184. ^
  185. Sample I (17 February 2010). "Desmond Tutu's genome sequenced as part of genetic diversity study". The Guardian.
  186. ^
  187. Schuster SC, Miller W, Ratan A, Tomsho LP, Giardine B, Kasson LR, et al. (February 2010). "Complete Khoisan and Bantu genomes from southern Africa". Nature. 463 (7283): 943–7. Bibcode:2010Natur.463..943S. doi:10.1038/nature08795. PMC3890430 . PMID20164927.
  188. ^
  189. Rasmussen M, Li Y, Lindgreen S, Pedersen JS, Albrechtsen A, Moltke I, et al. (February 2010). "Ancient human genome sequence of an extinct Palaeo-Eskimo". Nature. 463 (7282): 757–62. Bibcode:2010Natur.463..757R. doi:10.1038/nature08835. PMC3951495 . PMID20148029.
  190. ^
  191. Corpas M, Cariaso M, Coletta A, Weiss D, Harrison AP, Moran F, Yang H (12 November 2013). "A Complete Public Domain Family Genomics Dataset". bioRxiv10.1101/000216 .
  192. ^
  193. Corpas M (June 2013). "Crowdsourcing the corpasome". Source Code for Biology and Medicine. 8 (1): 13. doi:10.1186/1751-0473-8-13. PMC3706263 . PMID23799911.
  194. ^
  195. Mao Q, Ciotlos S, Zhang RY, Ball MP, Chin R, Carnevali P, et al. (October 2016). "The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes". GigaScience. 5 (1): 42. doi:10.1186/s13742-016-0148-z. PMC5057367 . PMID27724973.
  196. ^
  197. Cai B, Li B, Kiga N, Thusberg J, Bergquist T, Chen YC, et al. (September 2017). "Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges". Human Mutation. 38 (9): 1266–1276. doi:10.1002/humu.23265. PMC5645203 . PMID28544481.
  198. ^
  199. Gonzaga-Jauregui C, Lupski JR, Gibbs RA (2012). "Human genome sequencing in health and disease". Annual Review of Medicine. 63: 35–61. doi:10.1146/annurev-med-051010-162644. PMC3656720 . PMID22248320.
  200. ^
  201. Choi M, Scholl UI, Ji W, Liu T, Tikhonova IR, Zumbo P, Nayir A, Bakkaloğlu A, Ozen S, Sanjad S, Nelson-Williams C, Farhi A, Mane S, Lifton RP (November 2009). "Genetic diagnosis by whole exome capture and massively parallel DNA sequencing". Proceedings of the National Academy of Sciences of the United States of America. 106 (45): 19096–101. Bibcode:2009PNAS..10619096C. doi:10.1073/pnas.0910672106. PMC2768590 . PMID19861545.
  202. ^ ab
  203. Narasimhan VM, Xue Y, Tyler-Smith C (April 2016). "Human Knockout Carriers: Dead, Diseased, Healthy, or Improved?". Trends in Molecular Medicine. 22 (4): 341–351. doi:10.1016/j.molmed.2016.02.006. PMC4826344 . PMID26988438.
  204. ^
  205. Saleheen D, Natarajan P, Armean IM, Zhao W, Rasheed A, Khetarpal SA, et al. (April 2017). "Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity". Nature. 544 (7649): 235–239. Bibcode:2017Natur.544..235S. doi:10.1038/nature22034. PMC5600291 . PMID28406212.
  206. ^ ab
  207. Hamosh A, Scott AF, Amberger J, Bocchini C, Valle D, McKusick VA (January 2002). "Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders". Nucleic Acids Research. 30 (1): 52–5. doi: 10.1093/nar/30.1.52 . PMC99152 . PMID11752252.
  208. ^
  209. Katsanis N (November 2016). "The continuum of causality in human genetic disorders". Genome Biology. 17 (1): 233. doi:10.1186/s13059-016-1107-9. PMC5114767 . PMID27855690.
  210. ^
  211. Wong LC (2017). "Overview of the Clinical Utility of Next Generation Sequencing in Molecular Diagnoses of Human Genetic Disorders". In Wong LC (ed.). Next Generation Sequencing Based Clinical Molecular Diagnosis of Human Genetic Disorders. Springer International Publishing. pp. 1–11. doi:10.1007/978-3-319-56418-0_1. ISBN978-3-319-56418-0 . Missing or empty |title= (help)
  212. ^
  213. Fedick A, Zhang J (2017). "Next Generation of Carrier Screening". In Wong LC (ed.). Next Generation Sequencing Based Clinical Molecular Diagnosis of Human Genetic Disorders. Springer International Publishing. pp. 339–354. doi:10.1007/978-3-319-56418-0_16. ISBN978-3-319-56418-0 . Missing or empty |title= (help)
  214. ^
  215. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, et al. (December 2002). "Initial sequencing and comparative analysis of the mouse genome". Nature. 420 (6915): 520–62. Bibcode:2002Natur.420..520W. doi: 10.1038/nature01262 . PMID12466850. the proportion of small (50–100 bp) segments in the mammalian genome that is under (purifying) selection can be estimated to be about 5%. This proportion is much higher than can be explained by protein-coding sequences alone, implying that the genome contains many additional features (such as untranslated regions, regulatory elements, non-protein-coding genes, and chromosomal structural elements) under selection for biological function.
  216. ^
  217. Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, et al. (June 2007). "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project". Nature. 447 (7146): 799–816. Bibcode:2007Natur.447..799B. doi:10.1038/nature05874. PMC2212820 . PMID17571346.
  218. ^
  219. The Chimpanzee Sequencing Analysis Consortium (September 2005). "Initial sequence of the chimpanzee genome and comparison with the human genome". Nature. 437 (7055): 69–87. Bibcode:2005Natur.437. 69.. doi: 10.1038/nature04072 . PMID16136131. We calculate the genome-wide nucleotide divergence between human and chimpanzee to be 1.23%, confirming recent results from more limited studies.
  220. ^
  221. The Chimpanzee Sequencing Analysis Consortium (September 2005). "Initial sequence of the chimpanzee genome and comparison with the human genome". Nature. 437 (7055): 69–87. Bibcode:2005Natur.437. 69.. doi: 10.1038/nature04072 . PMID16136131. we estimate that polymorphism accounts for 14–22% of the observed divergence rate and thus that the fixed divergence is

120 ms 9.2% Scribunto_LuaSandboxCallback::plain 100 ms 7.7% Scribunto_LuaSandboxCallback::callParserFunction 100 ms 7.7% type 40 ms 3.1% Scribunto_LuaSandboxCallback::sub 40 ms 3.1% 40 ms 3.1% Scribunto_LuaSandboxCallback::getAllExpandedArguments 40 ms 3.1% [others] 280 ms 21.5% Number of Wikibase entities loaded: 1/400 -->


Grewal SI, Elgin SC: Heterochromatin: new possibilities for the inheritance of structure. Curr Opin Genet Dev. 2002, 12: 178-187. 10.1016/S0959-437X(02)00284-8.

Heitz E: Das Heterochromatin der Moose. Jahrbucher fur Wissenschaftliche Botanik. 1928, 69: 762-818.

Hoskins RA, Smith CD, Carlson JW, Carvalho AB, Halpern A, Kaminker JS, Kennedy C, Mungall CJ, Sullivan BA, Sutton GG, et al: Heterochromatic sequences in a Drosophila whole-genome shotgun assembly. Genome Biol. 2002, 3: R1-0085. 10.1186/gb-2002-3-12-research0085.

Dimitri P, Junakovic N, Arca B: Colonization of heterochromatic genes by transposable elements in Drosophila. Mol Biol Evol. 2003, 20: 503-512. 10.1093/molbev/msg048.

Pardue ML, Lowenhaupt K, Rich A, Nordheim A: (dC-dA)n.(dG-dT)n sequences have evolutionarily conserved chromosomal locations in Drosophila with implications for roles in chromosome structure and function. EMBO J. 1987, 6: 1781-1789.

Richards EJ, Elgin SC: Epigenetic codes for heterochromatin formation and silencing: rounding up the usual suspects. Cell. 2002, 108: 489-500. 10.1016/S0092-8674(02)00644-X.

James TC, Elgin SC: Identification of a nonhistone chromosomal protein associated with heterochromatin in Drosophila melanogaster and its gene. Mol Cell Biol. 1986, 6: 3862-3872.

Eissenberg JC, Elgin SC: The HP1 protein family: getting a grip on chromatin. Curr Opin Genet Dev. 2000, 10: 204-210. 10.1016/S0959-437X(00)00058-7.

Schotta G, Ebert A, Krauss V, Fischer A, Hoffmann J, Rea S, Jenuwein T, Dorn R, Reuter G: Central role of Drosophila SU(VAR)3-9 in histone H3-K9 methylation and heterochromatic gene silencing. EMBO J. 2002, 21: 1121-1131. 10.1093/emboj/21.5.1121.

Bannister AJ, Zegerman P, Partridge JF, Miska EA, Thomas JO, Allshire RC, Kouzarides T: Selective recognition of methylated lysine 9 on histone H3 by the HP1 chromo domain. Nature. 2001, 410: 120-124. 10.1038/35065138.

Eissenberg JC, James TC, Foster-Hartnett DM, Hartnett T, Ngan V, Elgin SC: Mutation in a heterochromatin-specific chromosomal protein is associated with suppression of position-effect variegation in Drosophila melanogaster. Proc Natl Acad Sci USA. 1990, 87: 9923-9927.

Misra S, Crosby MA, Mungall CJ, Matthews BB, Campbell KS, Hradecky P, Huang Y, Kaminker JS, Millburn GH, Prochnik SE, et al: Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol. 2002, R3: 1-0083. 10.1186/gb-2002-3-12-research0083.

Drysdale RA, Crosby MA: FlyBase: genes and gene models. Nucleic Acids Res. 2005, 33 (Database): D390-395. 10.1093/nar/gki046.

Barigozzi C, Dolfini S, Fraccaro M, Raimondi GR, Tiepolo L: In vitro study of the DNA replication patterns of somatic chromosomes of Drosophila melanogaster. Exp Cell Res. 1966, 43: 231-234. 10.1016/0014-4827(66)90399-5.

Bridges CB: The mutants and linkage data of chromosome four of Drosophila melanogaster. Biol Zh. 1935, 4: 401-420.

Miklos GL, Yamamoto MT, Davies J, Pirrotta V: Microcloning reveals a high frequency of repetitive sequences characteristic of chromosome 4 and the beta-heterochromatin of Drosophila melanogaster. Proc Natl Acad Sci USA. 1988, 85: 2051-2055.

Locke J, Podemski L, Roy K, Pilgrim D, Hodgetts R: Analysis of two cosmid clones from chromosome 4 of Drosophila melanogaster reveals two new genes amid an unusual arrangement of repeated sequences. Genome Res. 1999, 9: 137-149.

Kaminker JS, Bergman CM, Kronmiller B, Carlson J, Svirskas R, Patel S, Frise E, Wheeler DA, Lewis SE, Rubin GM, et al: The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective. Genome Biol. 2002, 3: R1-0084. 10.1186/gb-2002-3-12-research0084.

Bartolome C, Maside X, Charlesworth B: On the abundance and distribution of transposable elements in the genome of Drosophila melanogaster. Mol Biol Evol. 2002, 19: 926-937.

James TC, Eissenberg JC, Craig C, Dietrich V, Hobson A, Elgin SC: Distribution patterns of HP1, a heterochromatin-associated nonhistone chromosomal protein of Drosophila. Eur J Cell Biol. 1989, 50: 170-180.

Haynes KA, Leibovitch BA, Rangwala SH, Craig C, Elgin SC: Analyzing heterochromatin formation using chromosome 4 of Drosophila melanogaster. Cold Spring Harb Symp Quant Biol. 2004, 69: 267-272. 10.1101/sqb.2004.69.267.

Wallrath LL, Elgin SC: Position effect variegation in Drosophila is associated with an altered chromatin structure. Genes Dev. 1995, 9: 1263-1277.

Sun FL, Cuaycong MH, Craig CA, Wallrath LL, Locke J, Elgin SC: The fourth chromosome of Drosophila melanogaster : interspersed euchromatic and heterochromatic domains. Proc Natl Acad Sci USA. 2000, 97: 5340-5345. 10.1073/pnas.090530797.

Sun FL, Haynes K, Simpson CL, Lee SD, Collins L, Wuller J, Eissenberg JC, Elgin SC: cis -Acting determinants of heterochromatin formation on Drosophila melanogaster chromosome four. Mol Cell Biol. 2004, 24: 8210-8220. 10.1128/MCB.24.18.8210-8220.2004.

Matzke MA, Birchler JA: RNAi-mediated pathways in the nucleus. Nat Rev Genet. 2005, 6: 24-35. 10.1038/nrg1500.

Pal-Bhadra M, Leibovitch BA, Gandhi SG, Rao M, Bhadra U, Birchler JA, Elgin SC: Heterochromatic silencing and HP1 localization in Drosophila are dependent on the RNAi machinery. Science. 2004, 303: 669-672. 10.1126/science.1092653.

Aravin AA, Lagos-Quintana M, Yalcin A, Zavolan M, Marks D, Snyder B, Gaasterland T, Meyer J, Tuschl T: The small RNA profile during Drosophila melanogaster development. Dev Cell. 2003, 5: 337-350. 10.1016/S1534-5807(03)00228-4.

Clayton FE, Guest WC: Overview of chromosomal evolution in the family Drosophilidae. The Genetics and Biology of Drosophila. 1986, 3E: 1-38.

Sturtevant AH, Novitski E: The homologies of the chromosome elements in the genus Drosophila. Genetics. 1941, 26: 517-538.

Gubenko IS, Evgen'ev MB: Cytological and linkage maps of Drosophila virilis chromosomes. Genetica. 1984, 65: 127-139. 10.1007/BF00135277.

Podemski L, Ferrer C, Locke J: Whole arm inversions of chromosome 4 in Drosophila species. Chromosoma. 2001, 110: 305-312.

Powell JR, DeSalle R: Drosophila molecular phylogenies and their uses. Evol Biol. 1995, 28: 87-138.

Lowenhaupt K, Rich A, Pardue ML: Nonrandom distribution of long mono- and dinucleotide repeats in Drosophila chromosomes: correlations with dosage compensation, heterochromatin, and recombination. Mol Cell Biol. 1989, 9: 1173-1182.

Chino M, Kikkawa H: Mutants and crossing over in the dot-like chromosome of Drosophila virilis. Genetics. 1933, 18: 111-116.

Cliften PF, Hillier LW, Fulton L, Graves T, Miner T, Gish WR, Waterston RH, Johnston M: Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 2001, 11: 1175-1186. 10.1101/gr.182901.

Bergman CM, Pfeiffer BD, Rincon-Limas DE, Hoskins RA, Gnirke A, Mungall CJ, Wang AM, Kronmiller B, Pacleb J, Park S, et al: Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome. Genome Biol. 2002, 3: R1-0086. 10.1186/gb-2002-3-12-research0086.

Richards S, Liu Y, Bettencourt BR, Hradecky P, Letovsky S, Nielsen R, Thornton K, Hubisz MJ, Chen R, Meisel RP, et al: Comparative genome sequencing of Drosophila pseudoobscura : chromosomal, gene, and cis-element evolution. Genome Res. 2005, 15: 1-18. 10.1101/gr.3059305.

Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA, et al: A whole-genome assembly of Drosophila. Science. 2000, 287: 2196-2204. 10.1126/science.287.5461.2196.

Hartl DL, Nurminsky DI, Jones RW, Lozovskaya ER: Genome structure and evolution in Drosophila : applications of the framework P1 map. Proc Natl Acad Sci USA. 1994, 91: 6824-6829.

Drosophila Genome Project at Baylor College of Medicine. []

Ewing B, Hillier L, Wendl MC, Green P: Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998, 8: 175-185.

Ewing B, Green P: Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998, 8: 186-194.

Gordon D, Abajian C, Green P: Consed: a graphical tool for sequence finishing. Genome Res. 1998, 8: 195-202.

Devlin RH, Bingham B, Wakimoto BT: The organization and expression of the light gene, a heterochromatic gene of Drosophila melanogaster. Genetics. 1990, 125: 129-140.

Jurka J: Repeats in genomic DNA: mining and meaning. Curr Opin Struct Biol. 1998, 8: 333-337. 10.1016/S0959-440X(98)80067-5.

Jurka J: Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000, 16: 418-420. 10.1016/S0168-9525(00)02093-X.

Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.

Quesneville H, Bergman CM, Andrieu O, Autard D, Nouaud D, Ashburner M, Anxolabehere D: Combined evidence annotation of transposable elements in genome sequences. PLoS Comput Biol. 2005, 1: 166-175.

Edgar RC, Myers EW: PILER: Identification and classification of genomic repeats. Bioinformatics. 2005, 21 (Suppl 1): i152-i158. 10.1093/bioinformatics/bti1003.

Price AL, Jones NC, Pevzner PA: De novo identification of repeat families in large genomes. Bioinformatics. 2005, 21 (Suppl 1): i351-i358. 10.1093/bioinformatics/bti1018.

Assembly/Alignment/Annotation of 12 related Drosophila species. []

Yang HP, Hung TL, You TL, Yang TH: Genome-wide comparative analysis of the highly abundant transposable element DINE-1 suggests a recent transpositional burst in Drosophila yakuba. Genetics. 2005, doi:10.1534/genetics.105.051714

Quesneville H, Nouaud D, Anxolabehere D: Detection of new transposable element families in Drosophila melanogaster and Anopheles gambiae genomes. J Mol Evol. 2003, 57 (Suppl 1): S50-59. 10.1007/s00239-003-0007-2.

Singh ND, Arndt PF, Petrov DA: Genomic heterogeneity of background substitutional patterns in Drosophila melanogaster. Genetics. 2005, 169: 709-722. 10.1534/genetics.104.032250.

Locke J, Howard LT, Aippersbach N, Podemski L, Hodgetts RB: The characterization of DINE-1, a short, interspersed repetitive element present on chromosome and in the centric heterochromatin of Drosophila melanogaster. Chromosoma. 1999, 108: 356-366. 10.1007/s004120050387.

Kapitonov VV, Jurka J: Molecular paleontology of transposable elements in the Drosophila melanogaster genome. Proc Natl Acad Sci USA. 2003, 100: 6569-6574. 10.1073/pnas.0732024100.

Pyatkov KI, Shostak NG, Zelentsova ES, Lyozin GT, Melekhin MI, Finnegan DJ, Kidwell MG, Evgen'ev MB: Penelope retroelements from Drosophila virilis are active after transformation of Drosophila melanogaster. Proc Natl Acad Sci USA. 2002, 99: 16150-16155. 10.1073/pnas.252641799.

Evgen'ev M, Zelentsova H, Mnjoian L, Poluectova H, Kidwell MG: Invasion of Drosophila virilis by the Penelope transposable element. Chromosoma. 2000, 109: 350-357.

Coelho PA, Queiroz-Machado J, Hartl D, Sunkel CE: Pattern of chromosomal localization of the Hoppel transposable element family in the Drosophila melanogaster subgroup. Chromosome Res. 1998, 6: 385-395. 10.1023/A:1009277322626.

Reiss D, Quesneville H, Nouaud D, Andrieu O, Anxolabehere D: Hoppel, a P-like element without introns: a P-element ancestral structure or a retrotranscription derivative?. Mol Biol Evol. 2003, 20: 869-879. 10.1093/molbev/msg090.

Sijen T, Plasterk RH: Transposon silencing in the Caenorhabditis elegans germ line by natural RNAi. Nature. 2003, 426: 310-314. 10.1038/nature02107.

Dorer DR, Henikoff S: Expansions of transgene repeats cause heterochromatin formation and gene silencing in Drosophila. Cell. 1994, 77: 993-1002. 10.1016/0092-8674(94)90439-1.

Smalheiser NR, Torvik VI: Mammalian microRNAs derived from genomic repeats. Trends Genet. 2005, 21: 322-326. 10.1016/j.tig.2005.04.008.

Matthews DH, Sabina J, Zuker M, Turner DH: Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol. 1999, 288: 911-940. 10.1006/jmbi.1999.2700.

Stephens GE, Craig CA, Li Y, Wallrath LL, Elgin SCR: Immunofluorescent staining of polytene chromosomes: exploiting genetic tools. Methods Enzymol. 2004, 376: 372-393.

Henikoff S, Henikoff JG, Alford WJ, Pietrokovski S: Automated construction and graphical presentation of protein blocks from unaligned sequences. Gene. 1995, 163: GC17-26. 10.1016/0378-1119(95)00486-P.

Rose TM, Schultz ER, Henikoff JG, Pietrokovski S, McCallum CM, Henikoff S: Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res. 1998, 26: 1628-1635. 10.1093/nar/26.7.1628.

Casacuberta E, Pardue ML: Coevolution of the telomeric retrotransposons across Drosophila species. Genetics. 2002, 161: 1113-1124.

International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature. 2004, 431: 931-945. 10.1038/nature03001.

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.

Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997, 268: 78-94. 10.1006/jmbi.1997.0951.

Kent WJ: BLAT - the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664. 10.1101/gr.229202. Article published online before March 2002.

Drosophila Heterochromatin Genome Project PILER-DF Libraries. []

Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999, 27: 573-580. 10.1093/nar/27.2.573.

Wooton J, Federhen S: Statistics of local complexity in amino acid sequences and sequence databases. Comput Chem. 1993, 17: 149-163. 10.1016/0097-8485(93)85006-X.

Access options

Get full journal access for 1 year

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Get time limited or full article access on ReadCube.

All prices are NET prices.

Polytene Chromosomes, Heterochromatin, and Position Effect Variegation

This chapter describes that the problems relating to the organization of heterochromatin are prominent in the general thinking about the organization of genetic material in chromosomes. The finding of repetitive DNA sequences in the eukaryotic genome and the method of in situ hybridization provided a broader basis for an understanding of heterochromatin structure and detection of its richness in repetitive DNA sequences. A remarkable experimental model for studying the physiological role of heterochromatin is position effect. When a euchromatic region is transposed to heterochromatin, the genes immediately adjacent to it can become inactivated under position effect. The degree of genetic inactivation can be modified by various agents, including variation in heterochromatin amount in the cell. Thus, by taking advantage of position effect, the influence of gene activity in both trans and cis positions can be clarified. The chapter presents the studies on the morphological and morphofunctional organization of chromosome fragments in close proximity to heterochromatin demonstrated that genetic inactivation is related to compaction of the chromosome region and to the acquisition of the properties of a heterochromatin. Major breakthrough in studies of heterochromatin and position effect came with the discovery of the complex gene system affecting the expression of genetic inactivation and, consequently, the compaction degree of chromatin. The chapter concludes that new information calling for closer scrutiny has accumulated in the research area of heterochromatin.