About a month ago I put up my first post on this weblog. I argued that South Asians have been genetically undersampled, which is rather alarming considering that Indians, Pakistanis, Bangladeshis, Nepalis, Sri Lankans, etc., are 1/6th of humanity. The alarming part is that understanding population genetic structure is considered a prior condition of much medical genetic research. Without taking into account genetic relatedness one can not usefully establish correlations between particular variants and particular diseases or traits. For example, it is well known that South Asians have a high risk for diabetes, but the possible risk variation within this genetically diverse group has barely been addressed (there is some data which indicate that Bengalis and South Indians are at more risk than Punjabis). Continue reading
Author Archives: Razib Khan
The decline of Hindi among American brown folk
A few days ago Taz pointed me to the fact that Census 2010 had been releasing a lot more data in the past few months. I was naturally curious, so I decided to check out the website. Mind you, I’m someone rather familiar with the older website, and have downloaded raw data sets and crunched them with R. So I was cautiously optimistic. Very cautiously. Apparently the big news is that the American Factfinder now has a “web 2.0” version which will be releasing 2010 count data. Unfortunately, the implementation of AJAX makes the site very, very, slow (and beware of non-supported browsers!). And, there isn’t that much 2010 information.
To review, there are decennial censuses which are straightforward counts, and, since the aughts there have been American Community Survey results which are based on a sample, and so have a margin of error (this makes county-level data less useful because for small subpopulations within a county the margin of error can be very large). I’m looking forward to 2010 results because they are going to be much more robust than the ACS data which makes the news periodically for small subgroups. You can see where I’m going with this, since the core readership of this weblog comes from a group which is itself subdivided by nationality, ethnicity, class, and religion.
Though the initial intent was to find 2010 results for Indian Americans which I could compare across the older censuses, I stumbled onto some interesting language data. The pie chart is based on 2006-2008 ACS data. I’ve put a csv file of the data online. For the pie chart I removed some languages with less than 1,000 claimed speakers. The original data can be found here. The sample was limited to those aged 5 and over. If these data are correct ~2.25 million Americans spoke indigenous South Asian languages at home in the second half of the aughts. Continue reading
A civilization of regions
It is well known that in Western Europe the south of Italy is the poorest region. It is less well known, though not totally surprising, that regions of northern Italy such as Lombardy are among the wealthier areas of Western Europe. The aggregation of some of Europe’s wealthiest and poorest regions into one nation, Italy, obscures some very interesting fine scale trends. But since this is a weblog about brown-folk I’m not going to be discussing variance in statistics across the European Union. Rather, I want to address the issue of variance of statistics, and culture, across South Asia. India, Pakistan, Bangladesh, Nepal, and Sri Lanka (Bhutan is so small that I will leave it out of this treatment).
Intuitively we know that comparing Sri Lanka to India, or India to Pakistan, is apples and oranges. In terms of administrative units in the post-Westphalian age they’re equivalent, but we know that a nation of 20 million (Sri Lanka) and one of over 1 billion (India) are ludicrously mismatched. Even when comparing Pakistan to India, you have to face the fact that one state of India, Uttar Pradesh, is more populous than all of Pakistan! If Uttar Pradesh was a nation unto itself it would be the fifth most populous in the world. Continue reading
Do that Guju you do!
The 2009 paper Reconstructing Indian population history was a watershed in understanding the genomics of South Asians. Before this point the studies had been with unrepresentative samples, fewer markers, or, South Asians were only a sidelight. This paper put the focus on South Asians to elucidate the group’s population history (it still undersampled eastern South Asians, though this seems part of the plan because of their focus on two, not three, ancestral Indian components). If you want to know more about the paper, here is the ungated version. But in this post I want focus on an issue which you can find only in the supplements to the paper.
The HapMap project, which surveys genetic variation in world populations, has a set of Gujaratis, from Houston, Texas. This is currently the primary population of Indian origin you have to work with in the public data sets. There are other South Asian populations in the public domain, but their number of markers is far lower. So the Gujarati sample is very useful right now. But one thing that immediately jumps out at you is that there are in fact two Gujarati clusters. In the PCA plot I’ve extracted from the supplements you see the two largest components of genetic variation. PC 1, the x axis, separates whites from South Asians, and PC 2, separates one group of Gujaratis from everyone else. What’s going on here?
The undersampled 1 billion (genetically that is)
Two issues compel this post. One is practical. The other is more, shall I say, spiritual (or at least fun!). In regards to the first, a few weeks ago I reviewed a paper which reported that the efficacy of response to a particular leukemia treatment regime was dependent on the amount of Native American ancestry an individual had. One has to be specific here, because many people who are white or black American have significant Native American ancestry (Brett Favre’s paternal grandfather was Choctaw), and many people who identify as Native American may not have as much Native American ancestry as others. But for the purposes of this blog post, I want to bring to your attention the figure above, which I extracted from the paper. Its implications may pose a major problem in the future for South Asian biomedical research in the United States. Continue reading