About two and a half months ago I brought your attention to the fact that there is population substructure in the Gujaratis of Houston. That might sound strange, but here’s the back story. Over the past ~10 years or so there has been a project attempting to catalog common human genetic variation, known as the HapMap. The HapMap began with East Asian, West African, and European groups. But over the years it has been expanding. The first South Asian population added to the database were people of Gujarati origin in Houston, Texas. Therefore, you had a situation where in the medical genetic literature there was a lot of talk about “Gujaratis from Houston,” as if that was a group of particular importance.
The ultimate pragmatic rationale for the catalog was to allow researchers to control for ancestry when attempting to fix upon genes implicated in disease. By illustration, if Chinese have disease X at a greater frequency than Europeans, if you had a common pool of Chinese + Europeans then all the genetic variants associated with the Chinese might come up as causal, when actually it’s just a correlation with ancestry.And this brings me to the Houston Gujaratis. One thing that jumps out at you in analyses of genetic variation of this population set is that it has substructure. That is, there are two populations within the data set. More precisely, there is one tight cluster, while the rest of the individuals vary a great deal in their genetic character. The image above is my own plotting of the variation of Chinese and the Houston Gujaratis onto a cubic space. You immediately see that there is a Chinese cluster and a Gujarati cluster, and a range of Gujaratis who fall outside of the main cluster.
Knowing what we know about the prevalence of endogamy among South Asians the immediate model which jumped out at me was that the Houston Gujarati cluster was a specific subgroup which migrated to the United States. But who? My immediate hunch was that they might be a group of Patels. Others of you suggested Bohras.
I can now report something substantive thanks to Zack Ajmal. He has some Gujarati Patels in the Harappa Ancestry Project,and they match closely with the Gujarati cluster in question. This does not exclude the possibility that the cluster consists of Bohras, and does not entail that it must be Patels. I don’t know the relationship between these various groups in Gujarat. But I think we’re getting closer to a resolution of this mystery at least.
Of course the Gujarati HapMap cluster is not unknown to scholars. Two years ago in the supplements of the paper Reconstructing Indian History the authors observed the peculiar pattern in the principal component plots, which visualize the largest independent dimensions of variation in a data set. Most of the Indian populations fell along a line which has at one pole various groups like South Indian Dalits and at the other pole Europeans. But as the authors note a section of the Gujaratis were outside of the expected pattern. Why? Here is their hypothesis:
…Interestingly, one of the GIH subgroups fall outside the main gradient of Indian groups, suggesting that they harbor substantial ancestry that is not a simple mixture of ASI and ANI. A speculative hypothesisis that some Gujarati groups descend from the founders of the “Gurjara Pratihara” empire, which is thought to have been founded by Central Asian invaders in the 7th century A.D. and to have ruled parts of northwest India from the 7-12th centuries. I. Karve noted that endogamous groups with names like “Gurjar” are now distributed throughout the northwest of the subcontinent, and hypothesized that that they likely trace their names to this invading group.
This is wrong. The reason that a subset of Gujaratis fall outside of the main cluster is that they are a very genetically homogeneous group. This is why you exclude close relatives from these analyses; the relatives will shake out into their own clusters, which is obviously not what you want to clutter up the results. All the Gujaratis who are not in the cluster run the gamut you would expect in terms of ancestry for individuals from Central West India. Those in the distinctive cluster have a particular pattern in common.
To the left is a bar plot I generated from a selection of individuals and population from Zack’s K = 11 ADMIXTURE run. You can see the raw data in Google Docs. What K = 11 means that Zack took all the individuals in his data set, which runs into the thousands, and allowed the program to apportion them to 11 populations. These are not real populations necessarily, but abstractions. So you shouldn’t take the labels too seriously. I’ve limited it to the population components of particular relevance for South Asians. The labels in all caps are a number of individuals from public data sets. Those which are not in all caps are individuals from the Harappa Ancestry Project. I’ve constrained the individuals and populations to be somewhat informative of my overall point. What is that point? The “Patel” Gujarati cluster is among the most “pure” of South Asian populations. The Bengali to the left is my mother, and you see can see that her South Asian proportion drops mostly because of her elevated East Asian ancestry. Among the Jatts the European and Southwest Asian proportion is higher. The “Onge” components refers to an affinity with a tribe in the Andaman Islands. This, combined with the “S Asian” component is probably a good shadow of patterns of variation which denote ancestral deep roots within the Indian subcontinent. Combing the two you see the the Gujarati cluster and individuals affiliated with it top out in excess of 90%! I think this is the outcome of the ancient admixture event between “Ancestral North Indians” and “Ancestral South Indians” which defines South Asians as a distinctive genetic unit on a worldwide canvas. All those who came later, whether it be Austro-Asiatics, Aryans, or Scythians, are overlays upon this robust common substrate.
Ironically the geneticists who decided to select the Gujaratis of Houston stumbled onto a group which is archetypically representative of what it means to be South Asian in a biological sense.