Refining the Y chromosome phylogeny with southern African sequences.
The recent availability of large-scale sequence data for the human Y chromosome has revolutionized analyses of and insights gained from this non-recombining, paternally inherited chromosome. However, the studies to date focus on Eurasian variation, and hence the diversity of early-diverging branches found in Africa has not been adequately documented. Here, we analyze over 900 kb of Y chromosome sequence obtained from 547 individuals from southern African Khoisan- and Bantu-speaking populations, identifying 232 new sequences from basal haplogroups A and B. We identify new clades in the phylogeny, an older age for the root, and substantially older ages for some individual haplogroups. Furthermore, while haplogroup B2a is traditionally associated with the spread of Bantu speakers, we find that it probably also existed in Khoisan groups before the arrival of Bantu speakers. Finally, there is pronounced variation in branch length between major haplogroups; in particular, haplogroups associated with Bantu speakers have significantly longer branches. Technical artifacts cannot explain this branch length variation, which instead likely reflects aspects of the demographic history of Bantu speakers, such as recent population expansion and an older average paternal age. The influence of demographic factors on branch length variation has broader implications both for the human Y phylogeny and for similar analyses of other species.