A brown twist on personal genomics

I know that many people took advantage of the 23andMe sale I highlighted on Sunday. I also know that a fair number of these were brown, as I also have a list of people who I emailed, and several South Asians confirmed that they’d purchased the 23andMe kit. What do you get if you purchase this kit? Basically 1 million markers, SNPs, which are simply population-wide variant positions within your genome. These markers were chosen because variation is often informative, in terms of traits, as well as ancestry.

But obviously you are not going to just be looking at a string of letters. The data has to be analyzed for you. 23andMe provides a range of tools in this domain. But, one needs to use them cautiously, and also understand their limitations. In particular, these tools were often tuned for a specific set of populations which does not include South Asians. So some of the results are going to strike you as strange.

First, let’s hit the easy stuff. Health and traits.traits1.pngWhen you first go into your account you’ll see a list of options on the left. To the left is a screenshot of my own account. I’ll be using it as an example from this point on. The services basically fall into two categories: health and ancestry. Health itself is broken down into the disease risk, carrier status, drug response, traits, and health labs. The two that are generally of any interest in my experience are disease risk and traits. Carrier status is something that is important for potential parents, but you’ll probably already get screened. Again, with drug response you should already know this because of your medical history. Finally, health labs is experimental stuff which I haven’t found of interest (does it matter how much of your weight is due to your genetic risk?).

Let’s start out with disease risk. I’m 90% confident that most of you will not find anything particularly shocking, novel, or actionable. By this, I mean that you’ll probably find that you’re at a high risk for diseases which already run in your family, or, that you are at a marginally higher risk for a disease that don’t run in your family. There are exceptions. So that’s why I’m pegging it at 90%, a minority of people seem to find something genuinely novel, and unfortunately not in a good way.

If you do fall into the category of finding out some risk which you hadn’t expected, don’t freak out. Please make sure to read up on how these estimates are calculated. These are odds, and lack of family history is very important information.

As I noted above these tools are fine-tuned to particular populations. Most of the disease risk estimates are explicitly for Europeans. That’s because studies are generally done on Europeans. There are simply very few studies done on South Asians as a relative proportion. What this may mean is that if you have a risk due to a particular genetic variant, that risk may only hold for Europeans. The main caveat I would offer is that I am becoming convinced that this is less of an issue than had previously been thought. In other words, disease risk assessed in one population may be robustly inferred to another, more often than not.

For what it’ worth, 23andMe tells me that I am at typical risk for type 2 diabetes. This is incorrect. I have family information, and my risk is higher than typical.

Most of the traits which you have a genetic predisposition for will not be surprising to you. I do not have an alcohol flush reaction. My earwax is wet. My eyes are brown. But, some of the traits are of interest. I am a PTC non-taster. On the last one I knew this, but that’s because this is a basic high school genetics test. I knew I was a non-taster. A large minority of humans are, and from what I have seen non-taster status may be a majority among South Asians. It is a recessive trait, which means that if you have one functional copy you are a taster. Some scholars have suggested that people with two functional copies are “super-tasters.” What does this mean practically? Generally non-tasters have reduced responses to bitters, and often higher satiety thresholds to fat. Non-tasters often like vegetables. If you are a potential parent this genotype is probably important, as you can estimate the range of outcomes in your offspring.

Some of the traits which you wouldn’t know of off the topic of your head hopefully will never matter. I do not have have resistance to HIV progression, for example. This is a trait which is most common among Europeans, so this makes sense. I have friends of European ancestry who have some resistance. I do hope it does not change their behavior too much!

Moving on to ancestry, there are far fewer caveats. Disease risk assessment is not quite “prime time,” but ancestry inference is. That’s because a million markers is a whole lot just to estimate ancestry, which requires a representative snapshot of your genome. But for South Asians 23andMe’s tools leave something to be desired.

Below is my “ancestry painting.” I am apparently 60% European and 40% Asian. You see a chromosome by chromosome color-coding of my European and Asian ancestry.


Does this make sense? Yes and no. 23andMe uses three “reference” populations. Whites from Utah, Yoruba from Nigeria, and Han Chinese from Beijing. What their algorithm does is take your genetic variation, and see how it relates to these three populations, and construct a set of affinities genomic region by genomic region. This works well if your mother is black American and your father is white European. You will be given plausible results. It does not work so well when you have populations which are very different from the reference groups. For example, Somalis or South Asians. In the constraints of the program the results make sense. In the real world you are scratching your head.

For example, South Asians tend to be a mix of European and Asian according to this program (with many Pakistanis showing trace levels of African). The interval is 10 percent to 40 percent, with the vast majority of people in the 10 to 35 range, with an average around 25. The fact that I was at 40 percent was notable. Generally Punjabis are at the lower Asian fraction, while South Indians and Bengalis are at the higher Asian fractions. But my own results were the highest of anyone I could find. I strongly suspected that this indicated recent Asian admixture on top of the range you see in 23andMe’s algorithm for people of South Asian descent.

pca.pngThe PCA to the left also suggests that. What you see is a two dimensional visualization of the genetic variation of Central and South Asian populations. I’m the green individual. As you can see, I’m on the edge of the main South Asian cluster, toward the Hazara and Uyghur. These are two Central Asian populations with clear East Asian admixture. Aside from my family the other individuals in this area who are from the Indian subcontinent tend to be Bengalis, indicating the genetic affinities of this group with East Asians to a greater extent than among other Indians.

Like the ancestry painting one has to be careful of the PCA. Some of the individuals who are close in position to me have an East Asian parent and a European parent. These Eurasians are near some of the South Asians because their genetic combination places them in the same area when you visualize them on a two dimensional axis. But obviously there is a huge difference between Eurasians and South Asians. These tools of analysis and visualization can’t just be take literally, they need to be understood in their proper light.

There’s much more on the 23andMe site, but if you are a real nerd the best thing is that they give you is your raw data. That’s how I confirmed that my elevated Asian ancestry was probably due to relatively recent East Asian admixture. If you are not comfortable in the Linux environment, and want more detailed breakdowns, I’d suggest submitting to the Harappa Ancestry Project. It has a very large and robust South Asian data set to compare you against. But, if you want to do get your own hands dirty, read on.

First, you need to get ADMIXTURE. My tutorial will be helpful, but I’ve decided to make it even easier. I’ve created a file with French, Gujarati, Pathan, Chinese, and Yoruba, individuals. The Gujaratis have been screened for a high fraction of South Asian ancestry. I’ve zipped up the file here. It also has a “master list” of the individuals and their population assignments. What you probably want to do is:

1) get your own data

2) merge it with the file that I created for you

3) run ADMIXTURE, and see how you stack up

After you get your data, convert it to pedigree format. Here’s a script to do so in Perl. You can do this:

perl convert.pl "YourFileName" "001" "001"

Download Plink. Make your pedigree file binary:

./plink --file YourFileName --make-bed --out YourFileName

Now you want to merge it with the pedigree file I gave you:

./plink --bfile YourFileName --bmerge Brown.bed  Brown.bim  Brown.fam --make-bed --out Brown

You probably want to filter the SNPs:

./plink --bfile Brown --geno 0.01 --make-bed --out Brown

OK, so now you’re added to the file. You want to run ADMIXTURE:

./admixture Brown.bed 4

This will generate four populations. What you want.

I gave you a file which will be somewhat informative if you are brown. I ran the above file for myself, and you can see me right between the second cluster of Chinese and Gujaratis. Note the East Asian slice.


4 thoughts on “A brown twist on personal genomics

  1. Shucks I should have participated. Next time I’ll definitely give it a hard think and probably do it.

  2. This is a great post. Ancestry tools could be set up to use different refence populations, and should be. South Asia and South America are at the top of the list.

  3. Ooh, nerdylicious – thanks for the 23andMe primer! I took advantage of the 23andMe sale, and shall submit to the Harappa project. Hope I won’t be the lone Muslim Bihari ;-)