Population-genetic properties of differentiated human copy-number polymorphisms.
Copy-number variants (CNVs) can reach appreciable frequencies in the human population, and recent discoveries have shown that several of these copy-number polymorphisms (CNPs) are associated with human diseases, including lupus, psoriasis, Crohn disease, and obesity. Despite new advances, significant biases remain in terms of CNP discovery and genotyping. We developed a method based on single-channel intensity data and benchmarked against copy numbers determined from sequencing read depth to successfully obtain CNP genotypes for 1495 CNPs from 487 human DNA samples of diverse ethnic backgrounds. This microarray contained CNPs in segmental duplication-rich regions and insertions of sequences not represented in the reference genome assembly or on standard SNP microarray platforms. We observe that CNPs in segmental duplications are more likely to be population differentiated than CNPs in unique regions (p = 0.015) and that biallelic CNPs show greater stratification when compared to frequency-matched SNPs (p = 0.0026). Although biallelic CNPs show a strong correlation of copy number with flanking SNP genotypes, the majority of multicopy CNPs do not (40% with r > 0.8). We selected a subset of CNPs for further characterization in 1876 additional samples from 62 populations; this revealed striking population-differentiated structural variants in genes of clinical significance such as OCLN, a tight junction protein involved in hepatitis C viral entry. Our microarray design allows these variants to be rapidly tested for disease association and our results suggest that CNPs (especially those that cannot be imputed from SNP genotypes) might have contributed disproportionately to human diversity and selection.