Two issues compel this post. One is practical. The other is more, shall I say, spiritual (or at least fun!). In regards to the first, a few weeks ago I reviewed a paper which reported that the efficacy of response to a particular leukemia treatment regime was dependent on the amount of Native American ancestry an individual had. One has to be specific here, because many people who are white or black American have significant Native American ancestry (Brett Favre’s paternal grandfather was Choctaw), and many people who identify as Native American may not have as much Native American ancestry as others. But for the purposes of this blog post, I want to bring to your attention the figure above, which I extracted from the paper. Its implications may pose a major problem in the future for South Asian biomedical research in the United States. The chart is a bar plot where each thin slice is one of several thousand children suffering from leukemia, and you see in the color coding the amount of ancestry they have from a particular geographic region. So, most self-identified white children have predominantly European ancestry (red), though there are other elements (gray is African, purple is Native American, and green is Asian). Black Americans are mostly African, with a minority white component. Hispanic Americans are a mixture of various groups, depending on ethnicity. Finally, notice the Asian Americans: they are mostly Asian, but there is a minor European component! What’s going on here? It turns out that the hospitals recording race information are actually using the government categories. So Asian American includes South and East Asians, two very different groups genetically. In fact, genetically South Asians are somewhat closer to Europeans, on average, than they are to East Asians. This is a problem, because as many of you know different ethnic groups have different responses to various drugs, not to mention divergent health risks (raise your hands if you have a parent with type 2 diabetes), and aggregating these very different groups in a genetic sense into one racial category is problematic for biomedical research. I also realized that I was part of the problem, insofar as I do recall blithely entering “Asian” into the forms that were occasionally presented to me. Of course I’m aware that I’m a different kind of Asian than a Korean person, but I hadn’t given though to the downstream effect of my decision to check the Asian box. And I’m someone with a life sciences background!
Unfortunately, to add to this confusion of socio-political terms there is the problem that South Asians are one of the most undersampled groups when it comes to genetic variation in the world. Here are the South Asian groups in the Human Genome Diversity Project:
Notice something? They’re all Pakistani. Why? Because apparently the government of India threw a lot of red tape at the scientists who wanted to collect Indian data. This is still apparently an issue. An Indian American graduate student acquaintance told me recently that there had still been difficulty getting access to genetic material for the 1000 Genomes Project, though it looks like some Indian results will finally come online late in 2011.
Unfortunately even the selected groups won’t be enough. South Asians are a very diverse category genetically, with a lot history of endogamy and population substructure. What that means in plain English is that for medical purposes it may be particularly difficult to generalize from a small set of groups to the whole population for South Asians (in contrast to Europeans and East Asians, who are rather homogeneous). This is one reason that Indian Americans are useful, in that the regulatory barriers are somewhat lower in the United States. But as we all know Indian Americans are not a representative sample of South Asians, so that is no panacea.
Speaking of representative, note that above I mentioned offhand that on average South Asians are genetically closer to Europeans than East Asians. I was careful to say on average, because I am part of the minority which is closer to East Asians than Europeans. Because so much of the research has been focused on Pakistani groups as proxies for South Asians I believe that there has been an inordinate lack of awareness of the eastern element among groups like Bengalis. The 2009 paper Reconstructing Indian population history was a seminal contribution to understanding how South Asians came to be, but it too tended to neglect the eastern element of South Asian ancestry. The figure which I’ve posted is a PCA plot. In plain English the x and y axis represent the two largest dimensions of genetic variation in Eurasian populations. They are not to scale. The x axis explains much more genetic variation than the y axis. But, the plot shows the relationships of various populations clearly. Please observe my position. I’m clearly an outlier when compared to South Asians, shifted toward East Asians as you’d expect, away from the dominant linear trend.
Or am I an outlier? To some extent because of the undersampling of eastern South Asians when compared to other South Asians my outlier position may simply be because there are very few Bengalis who have been genotyped (or Assamese, people from Orissa, etc.). There may be plenty of people between me and the primary South Asian cluster, we just don’t know.
Unlike the problem with medical research (which will require consciousness raising, politics, and a shift in fiscal resources), you can help “fill the gap” in our knowledge. I generated the PCA above from results produced by my friend Zack Ajmal, who is behind the Harappa Ancestry Project. The goal of HAP is basically to fill in the gaps in South Asian genetics by collecting data from individuals who have been typed by 23andMe. Zack is looking for data from people of Iranian, South Asian, Tibetan, or Burmese background (he is also taking data from people with partial South Asian ancestry). You can see who has submitted so far. Unfortunately, my own family represents all of Bengal currently (only unrelated individuals are useful for this, so my parents mean an N = 2, as I’ve now been removed from the data set). There is only one respondent who has provided data who is from Uttar Pradesh. Zack has been cranking away at this for under a month, so I think he’ll have a more robust data set soon enough.
That’s all for now. In the next post I will address a strange pattern in the sample of Gujaratis from Houston, which is one of the few public Indian data sets. Genetics is powerful, but it has to be supplemented with other sorts of information. Hopefully readers of this weblog can chip to resolve this mystery, because from what I can tell geneticists see the pattern but don’t know what might explain it (I have a hunch).
Addendum: If you want to contribute to HAP, or know someone who might want to, here’s the page with the details. 1 million SNPs from 23andMe will now cost you $260. I wouldn’t recommend it if you want medically actionable results. But you get your raw data, so you can do your own analysis.