A second generation human haplotype map of over 3.1 million SNPs
Frazer KA., Ballinger DG., Cox DR., Hinds DA., Stuve LL., Gibbs RA., Belmont JW., Boudreau A., Hardenbol P., Leal SM., Pasternak S., Wheeler DA., Willis TD., Yu F., Yang H., Zeng C., Gao Y., Hu H., Hu W., Li C., Lin W., Liu S., Pan H., Tang X., Wang J., Wang W., Yu J., Zhang B., Zhang Q., Zhao H., Zhao H., Zhou J., Gabriel SB., Barry R., Blumenstiel B., Camargo A., Defelice M., Faggart M., Goyette M., Gupta S., Moore J., Nguyen H., Onofrio RC., Parkin M., Roy J., Stahl E., Winchester E., Ziaugra L., Altshuler D., Shen Y., Yao Z., Huang W., Chu X., He Y., Jin L., Liu Y., Shen Y., Sun W., Wang H., Wang Y., Wang Y., Xiong X., Xu L., Waye MMY., Tsui SKW., Xue H., Wong JTF., Galver LM., Fan JB., Gunderson K., Murray SS., Oliphant AR., Chee MS., Montpetit A., Chagnon F., Ferretti V., Leboeuf M., Olivier JF., Phillips MS., Roumy S., Sallée C., Verner A., Hudson TJ., Kwok PY., Cai D., Koboldt DC., Miller RD., Pawlikowska L., Taillon-Miller P., Xiao M., Tsui LC., Mak W., You QS., Tam PKH., Nakamura Y., Kawaguchi T., Kitamoto T., Morizono T., Nagashima A., Ohnishi Y.
We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r 2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r 2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations. ©2007 Nature Publishing Group.