Tag Archives: pop gen

Understanding Genetic Drift with the help of R.

A long standing debate among evolutionary biologists concerns the contribution of genetic drift to evolving populations. Fisher and Wright were the first scientists with different opinions on this topic. The former was in favor of selection as major engine of population evolution, while the latter argued that genetic drift could have a paramount effect, especially in small populations.

What is genetic drift? Genetic drift is the random loss of genetic variability within populations, generation after generation. By “random” I mean that it is impossible (or almost impossible) to predict the directionality of this process (i.e. whether an allele will increase or reduce in frequency). This, though, doesn’t prevent us to try and quantify the effect of genetic drift on allele frequency. As we shall see, the size-effect of genetic drift is strictly dependent on population size.

Lets assume that we have a very large population. Individuals have been genotyped for a di-allelic locus (A and a). Turns out that the frequency of A is p = 0.5 and the frequency of a is, hence,  1 – p. One generation goes by. How big of allele frequency change we expect to see in this new generation? Lets try to work this out, and always remind that we are only looking at genetic drift and not considering any other evolutionary mechanism like selection or the insurgence of new mutations.

A different way to ask the above question is: what is the probability of having the exact same allele frequency in the new generation if we sample 2N individuals? (two indicates that we are working with a diploid species.)

For example, if we extract 50000 gametes, we need exactly 25000 of them to be A in order to maintain p = 0.5. This is a classical binomial example in that we only have two possible outcomes (allele A and a), the probability of each outcome stays the same every time we extract a gamete (we allow replacement), and every gamete extraction is independent from the next one. So, what is the probability that extracting 50000 gametes we get exactly 25000 successes, i.e. 25000 times allele A?  In R is actually pretty easy. The following code will calculate the binomial probability of having exactly 25000 successes over 50000 trials and giving a probability of success of 0.5.

> dbinom(25000, 50000, 0.5)
[1] 0.00356823

Well. That probability is rather small. Would this change if we sample less individuals? We can try with ten, in which case we would need 5 A alleles to maintain p = 0.5

> dbinom(5, 10, 0.5)
[1] 0.2460938

According to the above data it seems that the smaller the number of individuals in the second generation, the higher the probability that we will have exactly the same allele frequency. This is only half the truth. That is, although the probability of maintaining p = 0.5  increases with a reduce number of individual, what changes is the variance around this frequency. Lets graph these results to have a better understanding of what I mean. Following is a little routine I wrote that will calculate a series of binomial probabilities and plots them against allele frequencies.

> k <- sort(c(2, 5, 10, 20, 50, 100, 500, 1000, 5000, 10000, 20000, 50000), decreasing = T)
> par(mfrow = c(3, 4))
> for (i in k) {
   y <- dbinom(0:i,i,0.5)
   x <- (seq(1,length(0:i)))/length(0:i)
   plot(x, y, xlim = c(0,1), col = "blue", ylab = "Probability", xlab = "Allele Frequency",
   main = paste("Probability Mass Function\n 0 < k <", i, ", p = 0.5"))
   abline(v = 0.5, lty = 2, lwd = 3,col = "red")

The output is this nice collection of bell curves. From top left to bottom right we are varying sample sizes.

K denotes number of successes

These graphs should tell us two very important things. First, no matter how big the sample, there will always be a randomly associated change in allele frequency. Second, the magnitude of this change (the width of the bell curve) grows with smaller sample sizes.

So, this is why genetic drift is supposedly considered a much stronger evolutionary agent in small populations, where the fluctuations of allele frequency is going to be rather large and could bring alleles to fixation or lost in a small number of generations.