In total, 10,330,879 AT-TTUs were detected genome-wide, of which the majority (9,936,861) (96.18%) were arranged in 1,390,055 colonies (Fig. 1). The AT-TTUs were spread across all chromosomes (Fig. 2).
Several of the large and medium-size AT-TTU colonies coincided with extensive dynamicity in great apes
Several of the large and medium-size AT-TTU colonies in human were also detected in other great apes (Table 1). Exceedingly dynamic events were detected across those colonies, affecting the AT-TTUs and the flanking sequences to those units. Across the colonies, the AT-TTUs were either pure or overlaps of two or more pure units.
Table 1
Several large and medium-size AT-TTU colonies in human and their corresponding colonies in other primates.
|
Colony Formulab
|
Chr.
No.
|
Location
(Colony interval)
|
λ value
|
|
Colony Size
in Humana
|
Human
|
Chimpanzee
|
Gorilla
|
Macaque
|
|
C718
|
[(ATT)2]32
[(ATA)2]5
[(AAT)2]11
[(TAA)2]2
[(TTA)2]318
[(TAT)2]350
|
[(ATT)2]21
[(ATA)2]5
[(AAT)2]3
[(TAA)2]2
[(TTA)2]262
[(TAT)2]288
|
[(ATT)2]3
[(ATA)2]2
[(TAA)2]1
[(TTA)2]4
[(TAT)2]7
|
|
11
|
11:114603789–114625648
|
75.27
|
|
C457
|
[(ATT)2]13
[(ATA)2]160
[(AAT)2]40
[(TAA)2]43
[(TTA)2]8
[(TAT)2]193
|
|
|
|
22
|
22:18728881–18733753
|
16.78
|
|
C350
|
[(ATT)2]12
[(ATA)2]165
[(AAT)2]15
[(TAA)2]11
[(TTA)2]11
[(TAT)2]136
|
|
|
|
11
|
11:96561723–96568076
|
21.88
|
|
C317
|
[(ATT)2]115
[(ATA)2]59
[(AAT)2]8
[(TAA)2]4
[(TTA)2]14
[(TAT)2]117
|
[(ATT)2]102
[(ATA)2]47
[(AAT)2]5
[(TAA)2]3
[(TTA)2]12
[(TAT)2]103
|
|
|
12
|
12:79456025–79468112
|
41.62
|
|
C287
|
[(ATT)2]11
[(ATA)2]112
[(AAT)2]30
[(TAA)2]13
[(TTA)2]16
[(TAT)2]105
|
[(ATT)2]2
[(ATA)2]17
[(AAT)2]1
[(TAA)2]1
[(TTA)2]2
[(TAT)2]24
|
|
[(ATT)2]2
[(ATA)2]3
[(AAT)2]2
[(TAA)2]1
[(TTA)2]2
[(TAT)2]6
|
X
|
X:143653179–143659751
|
22.63
|
|
C275
|
[(ATT)2]6
[(ATA)2]135
[(AAT)2]13
[(TAA)2]10
[(TTA)2]6
[(TAT)2]105
|
|
|
|
22
|
22:18891229–18896934
|
19.65
|
|
C267
|
[(ATT)2]4
[(ATA)2]151
[(AAT)2]4
[(TAA)2]4
[(TTA)2]5
[(TAT)2]99
|
|
|
|
2
|
2:16146128–16148756
|
9.05
|
|
C255
|
[(ATA)2]4
[(AAT)2]247
[(TAA)2]2
[(TTA)2]1
[(TAT)2]1
|
[(ATA)2]2
[(AAT)2]16
[(TAA)2]1
[(TTA)2]1
[(TAT)2]1
|
|
|
2
|
2:231822812–231850574
|
95.60
|
|
C212
|
[(ATA)2]101
[(AAT)2]4
[(TAA)2]6
[(TTA)2]2
[(TAT)2]99
|
[(ATT)2]20
[(ATA)2]95
[(AAT)2]20
[(TAA)2]16
[(TTA)2]8
[(TAT)2]88
|
|
[(ATT)2]1
[(ATA)2]17
[(AAT)2]1
[(TAA)2]3
[(TTA)2]1
[(TAT)2]20
|
X
|
X: 108772450–108776693
|
14.61
|
|
C200
|
[(ATT)2]4
[(ATA)2]77
[(AAT)2]76
[(TAA)2]9
[(TTA)2]2
[(TAT)2]32
|
[(ATT)2]2
[(ATA)2]72
[(AAT)2]70
[(TAA)2]7
[(TTA)2]2
[(TAT)2]30
|
|
[(ATA)2]80
[(AAT)2]78
[(TAA)2]10
[(TTA)2]3
[(TAT)2]29
|
7
|
7:37785667–37794029
|
28.80
|
|
C198
|
[(ATT)2]9
[(ATA)2]62
[(AAT)2]14
[(TAA)2]15
[(TTA)2]5
[(TAT)2]93
|
|
|
|
22
|
22:18235269–18238724
|
11.90
|
|
C195
|
[(ATT)2]12
[(ATA)2]61
[(AAT)2]9
[(TAA)2]6
[(TTA)2]15
[(TAT)2]92
|
[(ATT)2]10
[(ATA)2]48
[(AAT)2]3
[(TAA)2]4
[(TTA)2]10
[(TAT)2]58
|
[(ATT)2]6
[(ATA)2]31
[(AAT)2]4
[(TAA)2]4
[(TTA)2]4
[(TAT)2]17
|
|
10
|
10:67724852–67731164
|
21.74
|
|
C190
|
[(ATT)2]2
[(ATA)2]38
[(AAT)2]5
[(TAA)2]6
[(TTA)2]3
[(TAT)2]136
|
|
|
|
2
|
2:194462444–194465074
|
9.06
|
|
C184
|
[(ATT)2]34
[(ATA)2]18
[(AAT)2]30
[(TAA)2]4
[(TTA)2]61
[(TAT)2]37
|
[(ATT)2]23
[(ATA)2]10
[(AAT)2]19
[(TAA)2]3
[(TTA)2]34
[(TAT)2]24
|
[(ATT)2]19
[(ATA)2]10
[(AAT)2]17
[(TAA)2]2
[(TTA)2]26
[(TAT)2]19
|
[(ATT)2]6
[(ATA)2]1
[(AAT)2]3
[(TAA)2]1
[(TTA)2]2
[(TAT)2]5
|
6
|
6:77601972–77604184
|
7.62
|
|
C182
|
[(ATT)2]13
[(ATA)2]51
[(AAT)2]17
[(TAA)2]16
[(TTA)2]12
[(TAT)2]73
|
[(ATT)2]6
[(ATA)2]65
[(AAT)2]12
[(TAA)2]11
[(TTA)2]9
[(TAT)2]38
|
[(ATT)2]4
[(ATA)2]31
[(AAT)2]2
[(TAA)2]2
[(TTA)2]6
[(TAT)2]22
|
[(ATT)2]3
[(ATA)2]11
[(AAT)2]2
[(TAA)2]3
[(TTA)2]7
[(TAT)2]15
|
14
|
14:52527096–52531874
|
16.45
|
|
C175
|
[(ATT)2]5
[(ATA)2]68
[(AAT)2]5
[(TAA)2]7
[(TTA)2]23
[(TAT)2]67
|
|
|
|
7
|
7:54395223–54397576
|
8.10
|
|
C173
|
[(ATA)2]69
[(AAT)2]64
[(TAA)2]18
[(TAT)2]22
|
|
|
|
16
|
16:18565943–18570325
|
15.09
|
|
C161
|
[(ATT)2]68
[(ATA)2]17
[(AAT)2]2
[(TAA)2]2
[(TTA)2]8
[(TAT)2]64
|
[(ATT)2]65
[(ATA)2]17
[(AAT)2]2
[(TAA)2]2
[(TTA)2]8
[(TAT)2]64
|
[(ATT)2]68
[(ATA)2]17
[(AAT)2]3
[(TAA)2]2
[(TTA)2]4
[(TAT)2]65
|
|
3
|
3:8693642–8702099
|
29.12
|
aColony size, chromosomal location, colony interval, and λ value are based on the human genome, as reference. The corresponding colonies in other species were identified, using BLASTN. Instances in which the colonies were partially sequenced (such as C287, C212, and C200 in gorilla), or lacked the corresponding colony in a species were left blank. Poisson probability for all the colony sizes in this table is inherent zero. None of the colonies in this table were detected in mouse lemur or mouse.
bFormulas represent absolute numbers of units, regardless of being pure or overlapping.
Poisson probability values calculated for the occurrence of the colonies was inherent zero.
The largest AT-TTU colony in human was a compound colony of 718 units (C718), located on chromosome 11, which was detected with exceeding dynamicity in human and chimpanzee, and at a far lesser extent in gorilla. This colony reached maximum complexity and size in human (Fig. 4).
The absolute number of the AT-TTUs and the distribution of the units in the pure and overlapping compartments were exceedingly dynamic across those species, adding multiple layers of complexity of the events, and leading to massively divergent compositions. Most of the units in C718 and its orthologous colonies were in the overlapping compartment (Figs. 4 and 5A). The immediate flanking sequences of the overlapping units conformed to the flanking sequences of the involved pure units, and were significantly dynamic with respect to mutations (Fig. 5B).
Models proposed for the evolution of pure and overlapping units.
The pure units were the inverted or palindromic sequences of one another, and probably emerged, and resulted in DNA breakage and recombination events inherent to inverted and palindromic sequences, for example, two pure units of TTATTA and ATTATT (inversion), and TTATTA and TAATAA (palindrome).
Overlapping units were a consequence of unequal crossovers among the pure units. For example, in C718, the most prevalent overlapping unit, TTATTAT, was the consequence of unequal crossovers between pure units, TTATTA and TATTAT (Fig. 5A). In another example in C718, the overlapping unit, AATAATTATTAT, was the consequence of several unequal crossovers across units (Fig. 5A). It is conceivable that reverse processes leading to the overlapping units resulted in the re-emergence of the pure units.
The flanking sequences of the units were also highly dynamic (Fig. 5B), signifying the occurrence of crossovers at the sites of the AT-TTUs, and coupled breakage and repair at, and around these sites.
Coincidence of some of the colonies beyond great apes.
Several colonies, such as C212, C200, and C184 coincided beyond great apes, and included macaque (Table 1). As an example, in C184, the colonies were shared dynamically in human, chimpanzee, gorilla, and macaque, and there was a directional incremented trend of complexity of the events and units in human (Fig. 6). Pure and overlapping units were also detected across this colony in human and other primates. For example, TATTATTA, was the consequence of unequal crossovers between TATTAT, ATTATT, and TTATTA pure units.
Colonies that were detected in human and not the other five species studied.
We also detected many colonies that were found in human only (Table 1), examples of which are visualized for C457 (Fig. 7A) and C190 (Fig. 7B). Consecutive pure units recombining with each other, or pure and overlapping units recombining with each other were detected in those colonies.
AT-TTUs are a mechanism for the emergence of AT short tandem repeats (STRs).
The AT-TTUs and coupled unequal crossovers and recombination at these sites result in the emergence of STRs (repeats of ≥ 3). For example, in C184, the (TTA)3 STR could be a consequence of unequal crossovers through various paths (Fig. 8A and B). In other examples, in C457 and C190, unequal crossovers gave rise to overlapping units for the emergence of several (ATA)3 STRs (Fig. 8C, D, and E). We detected the pure units and intermediate overlapping units necessary for the emergence of STRs in the same (or orthologous) colonies that the STRs were detected.