Compositional Structure of the Genome: A Review
Metadata
Show full item recordAuthor
Bernaola-Galván, Pedro; Carpena, Pedro; Gómez Martín, Cristina; Oliver Jiménez, José LutgardoEditorial
MDPI
Materia
DNA compositional structure Sequence compositional complexity Segment compositional signature Hierarchical genome structure Evolutionary adaptive trends
Date
2023-06-13Referencia bibliográfica
Bernaola-Galván, P.; Carpena, P.; Gómez-Martín, C.; Oliver, J.L. Compositional Structure of the Genome: A Review. Biology 2023, 12, 849. https://doi.org/10.3390/biology12060849
Sponsorship
Spanish Minister of Science, Innovation and Universities (former Spanish Minister of Economy and Competitiveness); Project AGL2017-88702-C2-2-R) and Stitching Cancer Center Amsterdam for CGM (CCA2021-9-77); The Spanish Ministerio de Ciencia e Innovación (Grant no. PID2020-116711GB-I00); Spanish Junta de Andalucía (Grant no. FQM-362)Abstract
As the genome carries the historical information of a species’ biotic and environmental
interactions, analyzing changes in genome structure over time by using powerful statistical physics
methods (such as entropic segmentation algorithms, fluctuation analysis in DNA walks, or measures
of compositional complexity) provides valuable insights into genome evolution. Nucleotide frequencies
tend to vary along the DNA chain, resulting in a hierarchically patchy chromosome structure
with heterogeneities at different length scales that range from a few nucleotides to tens of millions of
them. Fluctuation analysis reveals that these compositional structures can be classified into three main
categories: (1) short-range heterogeneities (below a few kilobase pairs (Kbp)) primarily attributed
to the alternation of coding and noncoding regions, interspersed or tandem repeats densities, etc.;
(2) isochores, spanning tens to hundreds of tens of Kbp; and (3) superstructures, reaching sizes of
tens of megabase pairs (Mbp) or even larger. The obtained isochore and superstructure coordinates in
the first complete T2T human sequence are now shared in a public database. In this way, interested
researchers can use T2T isochore data, as well as the annotations for different genome elements, to
check a specific hypothesis about genome structure. Similarly to other levels of biological organization,
a hierarchical compositional structure is prevalent in the genome. Once the compositional
structure of a genome is identified, various measures can be derived to quantify the heterogeneity
of such structure. The distribution of segment G+C content has recently been proposed as a new
genome signature that proves to be useful for comparing complete genomes. Another meaningful
measure is the sequence compositional complexity (SCC), which has been used for genome structure
comparisons. Lastly, we review the recent genome comparisons in species of the ancient phylum
Cyanobacteria, conducted by phylogenetic regression of SCC against time, which have revealed
positive trends towards higher genome complexity. These findings provide the first evidence for a
driven progressive evolution of genome compositional structure.