Earlier this year I expressed excitement that the 1000 Genomes, “A Deep Catalog of Human Genetic Variation,” finally was going to add some more Indian populations. There was a sample of Gujaratis from Houston, but that’s a rather narrow slice of ~1 billion Indians, and nearly ~1.4 billion South Asians. The populations which were going to be added were Kayasthas from West Bengal, Marathas from Maharashtra, and Ahom from Assam.

Unfortunately, as I commented a few days ago that looks like it’s not happening. The Indian population collections have been removed from the website, and replaced by Sri Lankan Sinhalese and Tamils from the United Kingdom, and Bangladeshis. The Pakistani collection is already in process, as they’re getting the samples from Lahore.This is really sad. Apparently objections from the government of India and bureaucratic impasses made it so that the Human Genome Diversity Project had to use Pakistani populations as proxies for South Asians. This is acceptable, but the Pakistani populations are on the margin of the distribution of genetic variation in South Asian populations. Just like the Bangladeshi populations. This stands to reason, they’re marginally located geographically. The Marathas in particular would have been nice, since they’re probably much more South Asia typical. Typicality matters because South Asians have enough genetic diversity that it probably is something one should consider when controlling for population structure in medical genetics. For example, there is some data out of Britain that Bangladeshis have a higher risk for diabetes all factors controlled than Pakistanis. This may be due to cultural differences, or it may be due to genetics. Until you survey genetic variation within a set of populations you’ll never know which.

When I first began blogging about genetics here some commenters expressed frankly paranoid rantings about how the new genomics was going to enable a biological weapons program against India by the I.S.I. This is stupid. Pakistanis and Indians may differ, but they are rather similar, and there’s not much difference between ethnic Punjabis on either side of the border. But if you do have paranoid fantasies, don’t worry. It looks like if you want to get genetic information your best bet is to go to the non-Indian states of South Asia. By the end of year you’ll be able to download 100 full genome sequences of Pakistani Punjabis! I suppose that’s part of some nefarious plan….

In any case, on a positive note I don’t think that the Indian establishment’s intransigence on this issue matters. There are now millions of South Asians of various ethnicities across the world. The amateur Harappa Ancestry Project has over 100 genotypes all by itself. I suspect that the government of the United States or the United Kingdom could fund genomics projects which focus on various under-represented ethnicities in public databases due to the nature of politics abroad at some point in the near future. Full genome sequences will converge upon ~$1,000 in the next 5 years (they’re currently ~$20,000 or so per person).

Addendum: If you are unconvinced as to my confidence in the very low risk of biological weapons, download my genotype and send it to the I.S.I., explaining that I’m an anti-Muslim apostate with right-wing American political views. That’s true. If you want to goad them on, tell them I’m anti-Pakistani, and that I fantasize about building a Ram Temple in Islamabad. That’s not really true, but who knows what people will believe?

