The Driver of Extreme Human-Specific Olduvai Repeat Expansion Remains Highly Active in the Human Genome.
Sequences encoding Olduvai protein domains (formerly DUF1220) show the greatest human lineage-specific increase in copy number of any coding region in the genome and have been associated, in a dosage-dependent manner, with brain size, cognitive aptitude, autism, and schizophrenia. Tandem intragenic duplications of a three-domain block, termed the Olduvai triplet, in four NBPF genes in the chromosomal 1q21.1-0.2 region, are primarily responsible for the striking human-specific copy number increase. Interestingly, most of the Olduvai triplets are adjacent to, and transcriptionally coregulated with, three human-specific NOTCH2NL genes that have been shown to promote cortical neurogenesis. Until now, the underlying genomic events that drove the Olduvai hyperamplification in humans have remained unexplained. Here, we show that the presence or absence of an alternative first exon of the Olduvai triplet perfectly discriminates between amplified (58/58) and unamplified (0/12) triplets. We provide sequence and breakpoint analyses that suggest the alternative exon was produced by an nonallelic homologous recombination-based mechanism involving the duplicative transposition of an existing Olduvai exon found in the CON3 domain, which typically occurs at the C-terminal end of NBPF genes. We also provide suggestive in vitro evidence that the alternative exon may promote instability through a putative G-quadraplex (pG4)-based mechanism. Lastly, we use single-molecule optical mapping to characterize the intragenic structural variation observed in NBPF genes in 154 unrelated individuals and 52 related individuals from 16 families and show that the presence of pG4-containing Olduvai triplets is strongly correlated with high levels of Olduvai copy number variation. These results suggest that the same driver of genomic instability that allowed the evolutionarily recent, rapid, and extreme human-specific Olduvai expansion remains highly active in the human genome.