Applying Genetic Research To My 23andMe Data
To calculate your risk of various diseases, 23andMe scours the medical research literature for studies that correlate incidence rates for those diseases with mutations at specific locations in the human genome. The locations where these mutations commonly occur are referred to as single-nucleotide polymorphisms, or SNPs.
In this post, I show how I applied the findings of this study about caffeine-induced anxiety to discover more about myself. I have no genetic research background whatsoever, and my knowledge of genetics is minimal, so it's amazing that this is slowly becoming accessible to a wider audience.
The Genetic Culprit #
From the study abstract:
At the 150 mg dose of caffeine, we found a significant association between caffeine-induced anxiety (Visual Analog Scales, VAS) and ADORA2A rs5751876 (1976C/T), rs2298383 (intron 1a) and rs4822492 (3′-flank), and DRD2 rs1110976 (intron 6).
The SNPs rs5751876
, rs2298383
, rs4822492
, and a few others are claimed here to relate to caffeine-induced anxiety. Reading further, Figure 4 and Figure 5 show which variations are correlated with higher anxiety. Here's a quick summary of the high-anxiety variations listed in those figures:
rs5751876
: T/T (vs. C/C, C/T)rs2298383
: C/C (vs. T/T, C/T)rs4822492
: C/C (vs. G/G, G/C)rs1110976
: G/G (vs. G/-)
Armed with this information, I can check my own genome for variations at those SNPs.
Checking My Genome #
In the appendix of this blog post, I discuss how to retrieve your genetic data from the 23andMe API. I followed those directions with two changes:
- I used the authorization scope
genomes
; - I accessed the API endpoint
https://api.23andme.com/1/genomes/<id>
, where<id>
is my internal ID. This ID is returned with every response from their API.
I downloaded the genomic data as genome.json
:
curl https://api.23andme.com/1/genomes/<id>/ -H "Authorization: Bearer <access_token>" > genome.json
To help extract the specific SNPs listed above, I wrote
a quick Python script.
Running it, I get my results:
$ python extractSNP.py rs5751876 rs2298383 rs4822492 rs1110976 < genome.json
rs5751876 TT
rs2298383 CC
23andMe doesn't provide data on the rs4822492
or rs1110976
SNPs, but I can see that I have the high-anxiety variations at rs5751876
and rs2298383
.
The script uses 23andMe's SNP list, which identifies the locations of all SNPs that they look for. The first time you run extractSNP.py
, it will generate an index of all 23andMe SNPs in .snps-index
to make subsequent runs faster. Once you have your genome.json
data and have built the .snps-index
index, you can look up any SNP or group of SNPs in about a second.
Summary #
To apply genetic research findings to your genomic data:
- Use the process described in this blog post with scope
genomes
and API endpointhttps://api.23andme.com/1/genomes/<id>
to download your genetic data. Save it asgenome.json
. - Download my script into the same folder as
genome.json
. - Find a paper, journal article, etc. that mentions specific SNPs (those funny
rsXXX
oriXXX
numbers). - Run
python extractSNP.py rsXXX < genome.json
to search for those SNPs in your genome.
Caveats #
This is still a highly manual process, and you'll need some familiarity with the command line to pull it off. We haven't quite reached the promised near-future where the deepest insights about our personal genetics are easily available.
23andMe's data is incomplete: they only had two of the four listed SNPs. That said, it's far better than nothing!
Also, there's room for skepticism regarding the underlying research. Only 102 participants took part, all of European-American descent. The subjects all had stated caffeine intake levels of less than 300mg per day, or about 3 cups' worth - that's a pretty wide range of potential tolerances. Subjects rated their anxiety on a subjective scale. The researchers also took measurements of heart rate and blood pressure, but did not incorporate that data into the anxiety rankings.
No single study is perfect. For more reliable results, you'll want to dig up findings that are supported by multiple studies.
- Next: Quadtree Cartography
- Previous: The Behavioral Economics of 23AndMe Results