The Journal of Economic Perspectives has an excellent article by Beauchamp and colleagues titled Molecular Genetics and Economics (ungated pdf here). It is a nice contrast to another article in the same issue, Charles Manski’s bashing of the heritability straw man.
The authors argue that “genoeconomics”, the use of molecular genetics in economics, has the potential to supplement traditional behavioural genetic studies and build an understanding of the biology underlying economically relevant traits. They note that behavioural genetics, particularly research into heritability, has produced compelling evidence of the link between economically important characteristics and DNA. Molecular genetics is an “exciting tool” that can now be turned to this area.
However, potential pitfalls mar the way forward. These pitfalls are beautifully illustrated by a study that the authors undertook in which they sampled over half a million single-nucleotide polymorphisms (SNPs) from each of 7,500 people. An SNP is a DNA sequence variation where a single nucleotide differs between people. They then searched for SNPs associated with educational attainment. They found a large number of associations, many passing significance tests of 10-6. Passing this test suggests that there is a one in a million chance that the association is by chance (of course, there were 500,000 chances). If they took this result to the right journal, they might have had their study published and got some headlines about “the education gene”.
However, the authors took the 20 most significant associations from the first sample and checked them against the SNPs from another sample of 9,500 people. In the second sample, none of these 20 SNPs significantly affected educational attainment, even using a weak five per cent significance test. This showed that the results from the first sample were spurious.
The authors noted some important lessons from this. The first is that given the low sample size of many studies, the probability of a true association being discovered among the noise is minute. The studies are underpowered – power being the probability that an association between an SNP and the trait of interest will be found when there is a relationship. The fact that almost all SNPs reported in the literature can explain very little of the variation in most traits exacerbates this problem as the studies are trying to detect small effects. For example, no marker has been found to predict more than one per cent of the variation in height between people. As a result, very large samples are required to find true associations and sort them from the noise.
For example, with a five per cent threshold test for significance and an SNP that explains 0.1 per cent of variation in a trait, you need a sample of 4,000 subjects before the association has a 50 per cent chance of being found. Yet, in a 500,000 SNP panel there are likely to be thousands of false positives that meet the five per cent significance level.
If the significance test is increased to by a factor of one million to 10-8, which is appropriate given the huge number of potential associations being tested for in a 500,000 SNP panel, the need for a large sample size increases. For an SNP that explains 0.1 per cent of variation in a trait, the study will need a sample of around 25,000 to have a 50 per cent chance of detecting the relationship. If the SNP explains 0.01 per cent of the variation, a sample size of 200,000 results in only a 20 per cent chance of finding the relationship. However, the more stringent significance test reduces the number of false positives – it is just that the reduced number of false positives comes at the cost of power, which must be compensated for by increased sample size. At this time, there is little useful genetic data available in samples of this size.
Beyond the power issue, the authors identified publication bias as a problem. Papers which find interesting relationships are more likely to be published, which creates incentives for data mining and the write-up of results that are interesting but not robust. It is not easy to find a publisher for a paper that shows no relationship. This paper by Beauchamp and colleagues is the exception that proves the rule. To get their negative finding published they turned it into an analysis of the broader use of genetics in economics.
They do note, however, that data mining in genoeconomics is not in itself bad. It is when it is not accompanied by robust methodologies and stringent review processes that the problems arise.
Beauchamp and colleagues close their paper by noting some benefits of the genoeconomics enterprise. They endorse the use of genetic information in policy, even where the causal mechanisms are not known. They give the example of targeting children with markers for dyslexia with alternative teaching methods. This is a good long-term goal, but we will want to have SNPs explaining more variation in traits before this will be useful. For now, family history or information about siblings and twins is more useful information. How much of that information is being used now?
More interestingly, they suggest that this genetic data could be used as a control variable in other economics studies. If it is known that, say, income varies with certain SNPs, those SNPs might be used as a control in a study of how certain environmental factors affect income.
Their last suggestion is that the information obtained from genoeconomics could be used to understand variation in policy response across people. Compared to the standard economic assumption that everyone is the same, this might be the most radical effect of the genoeconomics enterprise.