Mutational Biases In Critical Thinking


A fundamental biological question is what forces shape the guanine plus cytosine (GC) content of genomes. We studied the specificity and rate of different mutational biases in real time in the bacterium Salmonella typhimurium under conditions of strongly reduced selection and in the absence of the major DNA repair systems involved in repairing common spontaneous mutations caused by oxidized and deaminated DNA bases. The mutational spectrum was determined by whole-genome sequencing of two S. typhimurium mutants that were serially passaged for 5,000 generations. Analysis of 943 identified base pair substitutions showed that 91% were GC-to-TA transversions and 7% were GC-to-AT transitions, commonly associated with 8-oxoG- and deamination-induced damages, respectively. Other types of base pair substitutions constituted the remaining 2% of the mutations. With regard to mutational biases, there was a significant increase in C-to-T transitions on the nontranscribed strand, and for highly expressed genes, C/G-to-T mutations were more common than expected; however, no significant mutational bias with regard to leading and lagging strands of replication or chromosome position were found. These results suggest that, based on the experimentally determined mutational rates and specificities, a bacterial genome lacking the relevant DNA repair systems could, as a consequence of these underlying mutational biases, very rapidly reduce its GC content.

A central question in evolutionary genomics is what mechanisms cause the variation observed in DNA base composition between and within genomes and how rapidly and by what mechanisms these biases might change in response to, for example, altered ecology and genetic constitution of the organism. The large range in guanine plus cytosine (GC) content among bacterial species is well established, varying between at least 17% and 75% GC, with an even larger variation in the third codon position (1). Within a genome, GC content usually is quite homogeneous and has a strong phylogenetic signal, but despite this overall homogeneity, there frequently exist strand-specific biases between the two strands of DNA such that the average nucleotide composition deviates from the theoretically expected A = T and G = C within each strand. Thus, most bacterial chromosomes are relatively strongly enriched in G over C and in T over A and are slightly depleted in G+C in weakly selected positions in the leading strand compared with in the lagging strand (2). In addition, highly transcribed genes appear to show a G and T skew on the nontranscribed strand compared with poorly transcribed genes (3).

Although the causes of these biases remain unclear, the biases can be expected to arise at least at three different levels. First, there might exist an underlying bias in the mutation pressure caused by unavoidable spontaneous DNA damage, such as deamination of C→U and 5-meC→T (4, 5) or oxidation of G to form 7,8-dihydro-8-oxoG (8-oxoG) (6). An example of such a bias is the deamination of C and 5-meC, which occurs more rapidly in single-stranded DNA than in double-stranded DNA in vitro (7). During replication and transcription, the leading and nontranscribed strands are in a single-strand state for a longer time than the lagging and transcribed strands, making them more prone to deamination (8, 9). These underlying spontaneous mutational pressures in turn might be modulated by the specificity and efficiency of the different repair systems that remove deaminated and oxidated DNA damages. With regard to deamination damages, two uracil glycosylases, encoded by the ung and mug genes, remove uracil from DNA, leaving an abasic site that can be restored by repair DNA synthesis (10–13), and the Vsr endonuclease encoded by the vsr gene initiates the very short patch repair system that removes thymine in a G-T mispair (14). The common oxidated base 8-oxo-G inserted into DNA is removed by two highly conserved enzymes, MutM and MutY. MutM is a glycosylase that excises 8-oxoG paired with C, thereby initiating base excision repair that restores the GC base pair. Failure to do so before replication allows 8-oxoG to mispair with A. This mispair is a substrate for another glycosylase, MutY, that removes adenine and allows subsequent DNA repair synthesis (15). Mutants defective in mutM or mutY have an elevated rate of GC-to-TA transversion mutations (15). Finally, a GC bias could be introduced by selection. Several theories emphasizing the role of selection in affecting genomic GC content have been suggested (16–21). Not surprisingly, given the complexities of bacterial ecology, none of these theories provides a universal explanation regarding the nature of those putative selective forces to explain the advantage of a high or low GC content.

A central and unanswered question in this context is how genomic GC content might change in response to alterations in the genetic capacity of the cell to repair different types of mutations. If an organism loses its ability to repair deamination and oxidative damages, how rapidly can the genomic GC content change, and what are the dominant mutational biases? These questions previously have been addressed primarily by analysis of sequenced genomes, making it difficult to distinguish between selection and neutral processes. Furthermore, experimental studies typically have been limited by the use of single genes instead of genomes or by experimental conditions that do not fully exclude the influence of selection. To avoid these limitations, we chose an experimental evolution approach in which mutational pressures were assessed in real time under conditions of strongly reduced selection and in the absence of the major DNA repair systems for deaminated and oxidized bases. Furthermore, to avoid any local biases in mutational pattern, we analyzed the mutational spectrum at the genome level by high-coverage DNA sequencing of two complete Salmonella typhimurium genomes. This approach allowed us to examine the presence of local and global mutational biases and to experimentally determine both the rate and the nature of the underlying mutational biases.


Experimental Rationale.

To address the question of mutational biases, random point mutations were allowed to accumulate in the genomes of a wild-type and four different S. typhimurium LT2 mutant strains constructed using linear transformation and phage P22 transduction (see Materials and Methods). Mutation accumulation was achieved experimentally by serially passaging different mutant bacteria with DNA repair system defects by repeated streaking on rich agar plates to generate new colonies initiated by a single cell. For each of the five strains, 12 lineages were serially passaged each day for 200 days. During each serial passage, the bacterial population expanded from 1 cell to 108 cells in a colony, representing ≈5,000 generations of growth for the entire experiment (200 serial passages × 25 generations/passage). The repeated one-cell bottleneck increased genetic drift and allowed all types of mutations to come to fixation with high and similar probabilities. The genotypes of the four mutants tested were ung, ungvsr, mutMmutY, and ungvsrmugmutMmutY, with the isogenic wild-type S. typhimurium LT2 used as a control. Because inactivation of ung, vsr, and mug results in increased GC-to-AT transitions (4, 5), and inactivation of mutM and mutY causes increased GC-to-TA transversions (15), it was possible to separate the effects of these repair genes in a single strain with all five genes inactivated. To confirm that the mutation rate did not change during the experiment, mutation rates to rifampicin resistance were measured for the ancestral strains and lineages after 200 cycles of serial passage (Fig. 1). No significant change in mutation rate was observed between the ancestral and evolved lineages, but one lineage of the quintuple mutant was excluded for further analysis because of its inability (for unknown reasons) to grow to high cell density in Luria-Bertani (LB) broth.

Fig. 1.

Mutation rates to rifampicin resistance for ancestral strains and lineages after 200 growth cycles. Closed circles represent the ancestral strains before mutation accumulation; open circles represent the evolved strains. Mutation rates were increased 6-fold in the ung- and ung- vsr- mutants and ≈100-fold in the mutM- mutY- and ung- vsr- mug- mutM- mutY- mutants compared with the wild type (3 × 10−9), with no significant differences between ancestral and evolved strains.

DNA Sequence Analysis.

To identify any mutations found in the 4.95-Mbp S. typhimurium genome, independent lineages of serially passaged wild-type and mutant bacteria were analyzed by whole-genome DNA sequencing using the 454 technology. The genomes of two random lineages of the ung- vsr- mug- mutM- mutY- quintuple mutant evolved for 200 cycles were sequenced to 8.7× and 7.8× depths. Sequences were assembled into contigs of about 4.8 Mbp of the 4.95-Mbp S. typhimurium genome, including pSLT (97%). The genome of an evolved wild-type strain was sequenced at lower coverage (4.9× depth), to confirm that deletion of repair systems was the cause of the observed mutation accumulation. The contigs were compared with the published S. typhimurium LT2 genome sequence using genomic BLASTn, allowing identification of the position and nature of all mutations. Single base pair deletions and insertions were not counted, because sequence technology−dependent uncertainties in base calling associated with mononucleotide runs prevented exclusion of false positives. Before further analyses, the pSLT plasmid, repeats, and duplicated regions were removed, leaving 4.68 Mbp of the chromosome. Differences between the published genome sequence and the sequenced lineages were excluded from further analysis if identical mutations were found in at least two of the evolved lineages, including the evolved wild-type lineage. A total of 68 differences were removed [see supporting information (SI) Text for details]. This decreased the likelihood that differences between our ancestral strain of S. typhimurium and the sequenced reference strain were counted as mutations and also removed any possible mutations introduced during strain construction. BLASTx analyses of regions surrounding the mutations were performed to identify the mutated genes and the coding context of all mutations.

Mutational Spectra and Biases.

In the two sequenced ung, vsr, mug, mutM, and mutY genomes, a total of 943 base pair substitutions (BPSs) were found, of which 856 (91%) were GC-to-TA transversions and 65 (7%) were GC-to-AT transitions, likely caused by 8-oxoG and deamination, respectively. Other types of BPSs represented the remaining 22 (2%) of the mutations (Table 1). Among the mutations found, 96% had an error rate of <99.99%, which was only slightly lower than the fraction for the entire sequence, suggesting that sequencing errors had only a very limited influence on our results (see Materials and Methods). No apparent differences in the number and types of mutations between the two sequenced strains were found (Table S1). In the serially passaged wild-type control strain, 15 BPSs were found in the 4.31-Mb analyzed sequence, and no mutational bias toward GC-to-TA transversions or GC-to-AT transitions was noted (Table S1), suggesting that ≈98% of the mutations that accumulated during serial passage of the quintuple mutant strain resulted from deletion of the repair systems. To investigate the presence of various mutational biases, GC-to-TA transversions and GC-to-AT transitions were analyzed (Table 2). Surprisingly, no significant bias in terms of leading and lagging strands of replication with regard to either G-to-T or C-to-T mutations were found (Table 1). To determine any potential bias in chromosome position and potential presence of mutational hot spots, Kolmogorov-Smirnov tests were performed against a uniform distribution; no significant deviations were found (P = .45 for G-to-T, P = .14 for C-to-T). The distribution of mutations with regard to chromosomal location is shown in Fig. 2. Data for all of the mutations found, including type of mutation, genomic position, strand of replication and transcription, gene, protein, amino acid substitution, codon change and strain, are given in Table S2.

Fig. 2.

Empirical cumulative distribution functions of (A) GC-to-TA transversions and (B) GC-to-AT transitions. If mutations are distributed randomly with regard to chromosomal location, then a linear relationship is expected, with no deviations at the origin of replication (4.08 Mbp) or terminus (1.61 Mbp).

To identify any other potential biases in the mutational spectrum, the expected number of each codon change under a random distribution of mutations was calculated, considering the frequency of the native codon in the S. typhimurium genome. This gave 96 possible codon changes each for both transitions and transversions. This distribution was then compared with the observed codon changes to assess potential biases. The numbers of G-to-T mutations in the nontranscribed and transcribed strands were not significantly different from those that would be expected to occur by chance; however, of the 60 C-to-T transitions, either in coding regions or less than 40 base pairs upstream or downstream of coding regions, 39 were found on the nontranscribed strand and 21 were found on the transcribed strand. If the overrepresentation of G in the nontranscribed strand were taken into account, then 28 mutations in the nontranscribed strand and 32 mutations in the transcribed strand would be expected. Our data indicate a significant increase in C-to-T transitions on the nontranscribed strand (P < .05, Fisher's exact test).

A set of highly expressed genes with a codon adaptation index higher than 0.66 (Table S3) comprising 78 kilobase pairs (kbp) was extracted from the HEG database and used to investigate the influence of expression on mutation rates (22). A comparison of the 60 C-to-T transitions with the list of highly expressed genes revealed 7 mutations in those genes, representing 12% of the total mutations. The mutagenic target of 78 kbp represents only 1.7% of the analyzed sequence, and there were seven times as many C-to-T transitions as would be expected to occur by chance in the highly expressed genes, suggesting a significant increase (P < .05, Fisher's exact test). For G-to-T transversions, 42 of the 856 mutations in the coding regions were found in the highly expressed genes, compared with the expected 14 mutations, which also makes the increase in G-to-T transversions in these genes highly significant (P < .0001, Fisher's exact test).

No significant biases in terms of nonsynonymous, synonymous, or nonsense mutations were found for either transversions or transitions (Table 2). Mutations were no less likely to be found in essential genes than in the rest of the chromosome, based on 301 genes classified as essential in Escherichia coli (Table S4; P = .19, Fisher's exact test). This suggests that purifying selection was significantly reduced during the experiment and that deleterious mutations accumulated at random with limited constraints on the fitness of the mutant, causing the fixation rate to be close to the mutation rate.

Base Pair Substitution Mutation Rates.

During every growth cycle, there were ≈25 cell divisions, giving a total of 5,000 generations during the mutation accumulation experiment. The BPS rate of the quintuple mutant was calculated by taking the average of the mutations per genome and dividing by the number of generations. The mutational spectrum of the quintuple mutant (ung, vsr, mug, mutM, and mutY) was biased heavily toward BSPs, which allowed estimation of the total mutation rate as the BSP rate. The mutation rate was 0.094 mutations per genome and generation, or 2.0 × 10−8 per base pair per generation. Reducing the GC content of the S. typhimurium chromosome by 1% would require ≈48,500 GC-to-AT mutations. Given the mutation rate calculated above, this could occur in about 500,000 generations, provided that the relevant repair systems were absent and that selection was reduced. Assuming a generation time of one generation per day in nature, this would take 1,400 years, a very short time span from an evolutionary perspective. The BSP rate of the wild-type S. typhimurium also can be approximately calculated from the sequencing, but this will provide only an upper limit, because sequencing errors will influence the result to a greater degree (see Materials and Methods). The mutation rate for the wild-type strain, calculated as described above, is then 3.4 × 10−3 BPSs per genome and generation, or 7.0 × 10−10 per base pair per generation. Results from the rifampicin resistance fluctuation test showed an ≈100-fold increase in the mutation rate for the quintuple mutant (Fig. 1), which is close to the 28-fold increase for the quintuple mutant calculated here considering the quality of the data for the wild-type sequence.

Fitness Loss During Serial Passage.

As expected, the average fitness decreased and the fitness variance increased with time for all five strains examined (Fig. 3A and B). The rate of fitness loss was similar to that for the wild-type strain for ung-, was increased 2-fold for ung- vsr-, and was increased 12-fold for mutM- mutY- and ung- vsr- mug- mutM- mutY-. The decreased fitness of the quintuple mutant during the experiment (1.37 × 10−4 per generation) allowed the average fitness loss per mutation to be estimated as 0.00145.

Fig. 3.

(A) Mean fitness of the evolved strains, measured as the generation time during exponential growth in rich growth medium normalized by the generation time of the ancestral wild type included in each experiment. (B) Variance in fitness between lineages of the evolved strains.

Table 1.

Mutational spectrum of base substitutions and calculated fixation rates

Table 2.

Substitution patterns of G → T and C → T mutations


This study demonstrates for the first time how, in the absence of the repair systems that normally counteract the intrinsic forces of deamination and oxidation, an underlying mutational bias could rapidly change the global genomic base composition. The experimental setup with population bottlenecks and a resulting strong reduction of selection allowed us to study the presence of several important mutational biases that have been used to explain the characteristics of base composition in bacterial genomes. In addition, the use of complete genome sequences makes the results more general, because it reduces the risk of false conclusions associated with experimental systems based on a single gene. The majority of the mutations found in this study were those associated with oxidation of guanine (91%); mutations associated with deamination of cytosine (7%) contributed much less to the change in GC. By determining the number of accumulated mutations and the associated fitness reduction, we can estimate both the genomic mutation rate and the average fitness loss per mutation. The estimate of the total genomic mutation rate to 3.4 × 10−3 per genome per generation for wild-type S. typhimurium is very close to that of Drake (23) based on lacI and the his operon in E. coli, which was determined to be about 3 × 10−3 per genome per generation. Our estimate of the average fitness loss of 1.5 × 10−3 per BPS is an order of magnitude lower than the upper-bound estimates of the average deleterious mutational effect in E. coli reported by Kibota and Lynch (24) and in S. typhimurium reported by Maisnier-Patin et al. (25). The difference between our present estimate and the estimates from those previous studies can be explained in part by the fact that we consider only BPS, disregarding other types of mutations.

With regard to the occurrence of various mutational biases, we report several significant observations. Highly expressed genes had a higher mutation rate for both transitions and transversions compared with the rest of the genome, suggesting that transcription can influence compositional biases. The phylogenetic observation that highly expressed genes have lower divergence seems to argue against this theory, but it appears that these genes have much fewer neutral or nearly neutral sites, which limits the fixation of mutations in these genes (26–28). Furthermore, C-to-T transitions likely caused by deamination of cytosine were more common in the nontranscribed strand of coding regions, whereas no such bias was found in the case of G-to-T transversions. These results support previous studies on single genes in which transcription was reported to increase mutation rates and a bias between the transcribed and nontranscribed strands for C-to-T transitions was observed (29, 30). The rate constants for cytosine deamination have been determined in vitro to be 10−10 per second in single-stranded DNA and 7 × 10−13 per second in double-stranded DNA under similar conditions as used in our experiments (pH 7.4; 37 °C) (7). If we assume that half of the deaminations will be fixed after replication, which is reasonable because one strand will still carry the correct base, then we can use the rate constants to estimate the number of mutations expected in our experiment. After 200 days, we would then expect about 15 mutations per genome, assuming a constant double-stranded state, and 2,100 mutations for a fully single-stranded state. The 32.5 mutations per genome observed in this study were higher than expected from the double-stranded state, suggesting that the time spent in the single-stranded state can influence strand bias. Assuming that the excess C-to-T transition mutations result from deamination only, the time spent in the single-stranded state can be estimated to be 0.8% of the total time (32.5–15/2,100–15).

Previously observed compositional biases between the leading and lagging strands of replication have been explained by the presence of mutational biases associated with the asymmetry of replication (31). Somewhat unexpectedly, under our experimental setup we found no asymmetry between the two strands for either C-to-T mutations or G-to-T mutations. Potential explanations for this finding include the possibility that the bias was too weak to detect using our experimental approach, that the asymmetry was not linked to the formation of these types of damage, and that the bias was related to an asymmetry conferred by the inactivated DNA repair systems. Regardless of the explanation for the lack of an observable leading-lagging bias, these data suggest that under our experimental setup, the transcription-associated biases were more pronounced than the replication-associated biases. Furthermore, we detected no effect of chromosomal location on mutation rate, as was suggested by Sharp et al. (27), who reported that synonymous mutations were more frequent in the terminus region. Thus, for DNA damages likely caused by deamination and oxidation of bases, no obvious hot spot regions were found, and the mutations appeared to be random in relation to chromosome location.

Finally, these results are of relevance for understanding the AT richness of the size-reduced genomes belonging to intracellular parasites and endosymbionts. This AT richness has been explained by relaxed selective constraints, caused by passage through small population bottlenecks combined with the loss of DNA repair systems, including ung, mutM, and mutY, during reductive evolution (32, 33). Our experiments mimic this evolutionary process and provide support for the importance of these repair systems in maintaining genomic GC content. Furthermore, as discussed above, the loss of these repair systems potentially could result in very rapid reduction in GC content. As shown in Table 3, all of the AT-rich small genomes lack all or some of the relevant repair genes, and it is conceivable that the reduced GC content is, at least in part, a consequence of the resulting stronger GC-to-AT mutational bias conferred by ubiquitous oxidation and deamination damages. A similar reasoning also might apply to protozoans (e.g., Plasmodium) that have AT-rich genomes and also lack many of the relevant repair genes (34).

Table 3.

Absence and presence of genes associated with DNA repair in bacteria with small genomes and low GC content

Materials and Methods

Strains and Media.

S. enterica var. Typhimurium LT2 (designated S. typhimurium here) and derivatives thereof were used in all experiments. All liquid media used was LB broth, and all solid media was LB agar supplemented with kanamycin, 30 or 50 mg/L, and ampicillin, 100 mg/L, for selection and plasmid maintenance.

Construction of Repair-Deficient Mutants.

Gene deletions in the S. typhimurium chromosome were made using the Lambda Red system as described previously (35). A kanamycin-resistance cassette flanked by FLP recombinase target sequences (FRTs) was amplified from template plasmid pKD4 (GenBank accession AY048743) with primers containing 40 base pair extensions homologous to regions in or adjacent to the ung, vsr, mug, mutM, and mutY genes and used for transformation of the Lambda Red strain. Deleted genes were sequentially moved to S. typhimurium LT2 by phage P22 transduction, and the FRT-flanked kanamycin cassette was removed with a helper plasmid (pCP20) expressing the site-specific FLP recombinase (35). All constructs were verified by colony polymerase chain reaction with primers outside the deleted regions. A detailed description of strain construction is given in the SI, and all primers used are listed in Table S5.

Mutation Accumulation Experiment.

Twelve independent lineages each of wild-type LT2, ung::kan, Δung vsr::kan, ΔmutM mutY::kan, and ΔungΔmugΔmutMΔmutY vsr::kan, were used for the mutation accumulation experiment. Each lineage was passaged through random single-cell bottlenecks on LB agar plates supplemented with 0.2% glucose by always choosing the last visible colony appearing in the streak irrespective of size or appearance every 24 h for 200 cycles at 37 °C.

Growth Rate Measurements.

Exponential growth rates were measured at 37 °C in LB broth. One μL of an overnight culture was used to inoculate 2 ml of LB broth; 350 μl of this was loaded into each well, and absorbance at 600 nm was recorded every 4 min by a BioscreenC reader (Labsystems). All growth rates were normalized to the growth rate of the ancestral S. typhimurium LT2 included in each experiment. Growth rates were measured in two separate experiments in quadruplicate.

Mutation Rate Determination.

Mutation rates were determined for the ancestral strains and for the evolved lineages after 200 cycles. Approximately 103 cells were used to inoculate 10 replicates with 220 μl of LB broth and grown for 24 h at 37 °C with shaking (200 rpm) in a 10-ml tube. Then 200 μl of the overnight culture was spread on LB agar plates with 100 mg/L of rifampicin, and suitable dilutions were spread on LB agar plates without rifampicin for determination of viable cells. The number of colonies was counted after incubation at 37 °C for 30 h, and mutation rates were calculated using the Lea-Coulson method of the median (for low/moderate number of mutations) or the Drake formula (for high mutation number of mutations) as described previously (36).

DNA Sequencing and Analysis.

Genomic DNA was prepared from two lineages of the Δung Δmug ΔmutM ΔmutY vsr::kan strain and one lineage of the wild-type strain evolved for 200 cycles using a Genomic Tip 100G (Qiagen) according to the manufacturer's instructions. The lineages were chosen using a random number generator among the strains with similar mutation rates as its ancestor. (Three lineages of the quintuple mutant were excluded.) Genome sequencing was performed with a Genome Sequencer FLX (Roche) at the KTH Sequencing Facility, Royal Institute of Technology, KTH, Stockholm, Sweden. Contigs longer than 500 bp were used for BLAST searches against the reference S. typhimurium LT2 genome to find mutations. For the two sequenced quintuple mutants, the fraction of bases with quality scores higher than Q40 were 97%, where Q40 represents an error rate of 99.99%. Among the mutations found, 96% were +Q40 bases, only slightly lower than the fraction for the entire sequence, suggesting that sequencing errors had only a limited influence on our results. In the sequenced evolved wild-type strain, 95% were Q40 bases, but only 47% of the mutations were found, suggesting a larger fraction of false positives. Statistical analyses were performed using the R statistics package (R Project for Statistical Computing;


This work was supported by grants from the Swedish Research Council and Uppsala University (to D.I.A.). We thank Otto Berg, Linus Sandegren, and Staffan Svärd for comments and a critical reading of the manuscript.


  • 1To whom correspondence should be addressed. E-mail: dan.andersson{at}
  • Author contributions: P.A.L. and D.I.A. designed research, P.A.L. performed research, P.A.L. and D.I.A. analyzed data, and P.A.L. and D.I.A. wrote the paper.

  • The authors declare no conflicts of interest.

  • This article is a PNAS Direct Submission.

  • This article contains supporting information online at

  • © 2008 by The National Academy of Sciences of the USA


Mutation-biased adaptation reaches the mainstream

October 21, 2015 / Arlin Stoltzfus / 1 Comment

The most recent issue of PNAS includes a report by Galen, et al linking enhanced mutation at a CpG site to altitude adaptation in Andean house wrens (Troglodytes aedon), based on clear biogeographic and biochemical evidence of adaptation.  I’ve been waiting for this, both in the narrow sense that I’ve been waiting for this particular study to appear in print, and also in the broader sense that I have been waiting for any paper on mutation-biased adaptation to appear in a prominent venue.  Results like these, one hopes, will overturn the “raw materials” doctrine of neo-Darwinism and stimulate the development of a new understanding of the role of mutation in evolution.

Some parts of this story need further work, as suggested in the PNAS commentary.

However, I want to discuss broader implications, rather than dwell on uncertainties.  So, I’m going to assume that Galen, et al  are right, and that we see the Ile55 change in high-altitude wrens partly because it is beneficial, and partly because its occurrence is favored by a roughly 10-fold higher rate of mutation.  That is, the higher mutation rate probabilized this particular adaptive change.

Causal mutations identified by Pepin and Wichman in laboratory adaptation show a bias toward transitions. The right column indicates that, for some of these, the change could be a hitch-hiker rather than a driver. If we cross those out (lines), the remaining changes show a significant bias of 13 transitions to 5 transversions (null expectation is roughly 1:2).

Furthermore, I’m going to assume that this is just the tip of the iceberg.   If we look at other cases, we’ll see more evidence for a role of CpG hotspots.  We’ll find that this effect applies to other types of mutation biases, e.g., transition-transversion bias, insertion-deletion bias, and so on. Furthermore, we will find that, when objective measures of effect-size are applied, the effects of mutational bias will be substantial.  When I make this suggestion, it is not idle speculation.  I’ve been tracking the evidence for years, and I reviewed some of it previously in The Revolt of the Clay and a followup (this is a more extensive list than appears in the PNAS commentary).  For instance, in a study that I neglected to mention previously, Pepin and Wichman (2008) carry out repeated adaptation of phiX174, and the results show a clear bias toward transitions (above Table).

Excavating the memory hole

If all that is true, the implications for evolutionary theory are staggering.  If “staggering” sounds like an exaggeration, a brief history lesson should set the record straight.  Let’s review what leading authorities said throughout the latter half of the 20th century about the role of mutation and the influence of mutation rates (sources):

“The large number of variants arising in each generation by mutation represents only a small fraction of the total amount of genetic variability present in natural populations. … It follows that rates of evolution are not likely to be closely correlated with rates of mutation . . . Even if mutation rates would increase by a factor of 10, newly induced mutations would represent only a very small fraction of the variation present at any one time in populations of outcrossing, sexually reproducing organisms.” (Dobzhansky, et al., 1977, p. 72)

“mutations are rarely if ever the direct source of variation upon which evolutionary change is based. Instead, they replenish the supply of variability in the gene pool which is constantly being reduced by selective elimination of unfavorable variants. Because in any one generation the amount of variation contributed to a population by mutation is tiny compared to that brought about by recombination of pre-existing genetic differences, even a doubling or trebling of the mutation rate will have very little effect upon the amount of genetic variability available to the action of natural selection. Consequently, we should not expect to find any relationship between rate of mutation and rate of evolution. There is no evidence that such a relationship exists.” (my emphasis) (Stebbins, 1966, p. 29)

“Those authors who thought that mutations alone supplied the variability on which selection can act, often called natural selection a chance theory. They said that evolution had to wait for the lucky accident of a favorable mutation before natural selection could become active. This is now known to be completely wrong. Recombination provides in every generation abundant variation on which the selection of the relatively better adapted members of a population can work.” (Mayr, 1994, p. 38)

“The process of mutation supplies the raw materials of evolution, but the tempo of evolution is determined at the populational levels, by natural selection in conjunction with the ecology and the reproductive biology of the group of organisms” (Dobzhansky, 1955, p. 282)

“It is most important to clear up first some misconceptions still held by a few, not familiar with modern genetics: (1) Evolution is not primarily a genetic event. Mutation merely supplies the gene pool with genetic variation; it is selection that induces evolutionary change.” (Mayr, 1963, p. 613)

“if ever it could have been thought that mutation is important in the control of evolution, it is impossible to think so now; for not only do we observe it to be so rare that it cannot compete with the forces of selection but we know this must inevitably be so. “ (Ford, 1971, p. 361)

“Each unitary random variation is therefore of little consequence, and may be compared to random movements of molecules within a gas or liquid. Directional movements of air or water can be produced only by forces that act at a much broader level than the movements of individual molecules, e.g., differences in air pressure, which produce wind, or differences in slope, which produce stream currents. In an analogous fashion, the directional force of evolution, natural selection, acts on the basis of conditions existing at the broad level of the environment as it affects populations.” (Dobzhansky, et al., 1977, p. 6)

“Novelty does not arise because of unique mutations or other genetic changes that appear spontaneously and randomly in populations, regardless of their environment. Selection pressure for it is generated by the appearance of novel challenges presented by the environment and by the ability of certain populations to meet such challenges.”(Stebbins, 1982, p. 160)

According to this theory, every species has a “gene pool” that serves as a kind of dynamic buffer, soaking up and maintaining variation so that selection never has to wait for a new mutation.  This buffer effectively insulates evolution from effects of mutation.  As a result, mutations do not play a direct role in evolution, and they do not initiate change; rates of mutation are not determinative, so we don’t expect to see any correlation of the rate of evolution with the mutation rate.  In general, mutation merely supplies raw materials, while selection is a higher cause, acting at a higher level to determine the direction and rate of change.

Near the end of this post, I will try to explain why the architects of the Modern Synthesis were so committed to a theory they could not have proved, and that seems hopelessly wrong today.  For now, I just want to point out that this is a consistent position presented in forceful language, based on direct and confident appealsto concepts from population genetics.

A lopsided legacy

No one literally defends the “gene pool” theory anymore, nor does anyone (except Richard Dawkins) dismiss the role of mutation rates.  Alas, this shift in mainstream beliefs was not accompanied by a revolution or any conscious restructuring of evolutionary theory.  The reformist energy that should have gone into developing a mutationist alternative was sucked away by the Neutral Theory.  Over time, Mayr and his cohort died off, and their intellectual descendants just stopped saying the things that were clearly wrong.  Today we are left with a confused mixture, a Franken-theory with some zombie parts that just won’t go away.

Consider the neo-Darwinian catechism on the role of variation in evolution, which consists of 3 concepts.  The first 2, which go back to Darwin’s time, are that mutation is merely a source of “raw materials” and merely a source of “chance.”  The view of mutation and selection as opposing forces was developed by the Modern Synthesis. 

Apparently, it is not widely known that the raw materials doctrine refers to raw materials (e.g., crude oil, seawater, logs, coal, etc), and that it evokes Aristotle’s 4-fold classification of causes— “material” causes being the lowest kind.  For instance, a shirt is made from fabric, fabric is woven from thread, and the thread is spun from either (1) natural fibers, in which case the raw materials are cotton, silkworm cocoons, etc, or (2) synthetic fibers, in which case the raw material is crude oil.   Fabric is a material, but not a raw material.  “Raw” materials are raw, unprocessed, unrefined.

Materials do not make the shirt happen, and do not dictate the size or shape.   Some agent has to spin the thread, weave the fabric, cut it, and sew the pieces together.   More generally, material causes are passive, and provide substance only, not form or initiative or direction.  The final product is not implied or embodied in the materials.   Instead, some active force or agent gives shape and form to the materials.

That is, the “raw materials” doctrine is a deliberate attempt to depict variation as a kind of passive clay that can be molded into anything, with selection as the agent— the active force shaping outcomes.  In the Darwinian “gradualist” view, a single variation contributes to adaptation in the same way that a single grain of sand contributes to a sand-castle (see Why Size Matters: Saltation, Creativity and the Reign of the DiNOs).  I suspect that if scientists stopped to consider what the “raw materials” doctrine intends, they would stop repeating it.  For instance, one sees “raw materials” in the evo-devo literature, but no one in evo-devo actually believes this— they are all searching for developmental mutations that change body plans and reconfigure toolbox genes and generally make fantastic things happen.

The view depicting mutation and selection as opposing forces, with mutation too weak to overcome selection, arose from an early mathematical result called the mutation-selection balance, which represents how often we would expect to see a particular disease-causing mutation.  This view makes mutation-biased adaptation seem like an impossibility.  Below, I’ll use a population-genetics model to explain why this view misleads us.

In The Lady in the Water, Reggie (actor Freddy Ferguson) only exercises his right side. In this scene, Reggie is revealed as The Protector when his stare drives away a demon. The Lady in the Water earned accolades as The Worst Movie Ever, but that was before The Last Airbender.

Finally, if mutation is merely a source of “chance”, then how can a bias in mutation make evolution more predictable?

These old ways of thinking aren’t helpful.  They don’t make the possibility of mutation-biased adaptation suggested by Galen, et al. comprehensible. We’ve inherited a lop-sided legacy.  When 3 generations of scientists are taught that mutation merely supplies raw material, that it is a weak force, a source of chance, etc., we can’t expect this to promote a deep understanding of mutation.  Like the character of Reggie in M. Night Shyamalan’s The Lady in the Water, we’re only flexing our intellectual muscles on one side

Re-thinking the role of mutation

To develop a new understanding of the role of mutation in evolution, let’s start by re-thinking the common metaphor of evolution as a climbing algorithm.  Imagine, as an analogy for evolution, a climber operating on the jagged and forbidding landscape of Les Drus (Figure). A human climber would scout a path to a peak and plan accordingly, but a metaphor for evolution must disallow foresight and planning, therefore let us imagine a blind robotic climber. The climber will move by a two-step mechanism. In the “proposal” step, the robotic climber reaches out with one of its limbs to sample a point of leverage, some nearby hand-hold or foot-hold. Each time this happens, there is some probability of a second “acceptance” step, in which the climber commits to the point of leverage, shifting its center of mass.

Les Drus (copyright Guillaume Dargaud)

Biasing the second step, such that relatively higher points of leverage have relatively higher probabilities of acceptance, causes the climber to ascend, resulting in a mechanism, not just for moving, but for climbing.

What happens if a bias is imposed on the  proposal step? Imagine that the robotic climber (perhaps by virtue of longer or more active limbs on one side) samples more points on the left than on the right during the proposal step. Because the probability of proposal is greater on the left, the joint probability of proposal-and-acceptance is greater (on average), so the trajectory of the climber will be biased, not just upwards, but to the left as well.  If the landscape is rough, the climber will tend to get stuck on a local peak that is upwards and to the left of its starting point.

Now, let us take this idea and make it into a population-genetics model, following Yampolsky and Stoltzfus (2001).    
We’ll start with an ab population and evolve either to Ab or aB.  That is, we are going on a one-step climb, and we’ll climb either to the left, or to the right.  Obviously, if Ab is the more fit alternative (i.e., if s1 > s2and the mutation rate to Ab is higher (u1 > u2), then Ab is favored.  But what if Ab is the more fit alternative, and mutation favors aB?  That’s the critical case.

The results are shown at right.  The upward slope indicates that the bias in outcomes (toward the mutationally favored peak) increases with increased mutation bias.  In the smallest population, we are looking at neutral evolution, where only the bias in mutation (dashed line showing B = u1/u2) matters.  As population size increases, we enter the regime of origin-fixation dynamics (or what Gillespie calls “strong selection, weak mutation”), where there is a proportional effect of both mutation bias and fixation bias, shown by the dotted line, which is (s1/s2)*(u1/u2)

As population size gets larger and uN is no longer small, we depart from strict origin-fixation dynamics, but there is still an effect of mutation bias (uN is about 1 and 10 for the 2 largest populations).

Evolution doesn’t have to work this way.  In a previous post I invoked 2 different styles of self-service restaurant— the Buffet and the Sushi Conveyor— to compare and contrast 2 different regimes of population genetics.  The sushi conveyor offers a dynamic, iterated process of proposal and acceptance.

The sushi conveyor: We iteratively make a yes-or-no choice on the chef’s latest creation as it passes by on a moving conveyor. A bias in the rate of appearance directly biases the outcome.

We choose (we select), but we don’t control what is offered or when: instead, we accept or reject each dish that passes by our table.  This is like origin-fixation models of evolutionary dynamics, which depict evolution as a discrete 2-step process of mutational origination followed by fixation or loss (by selection or drift).

The Buffet: we begin with a practically inexhaustible abundance of static choices in full view, and fill our plate with the desired amount of each dish. A bias in amount does not change outcomes.

The architects of the Modern Synthesis viewed adaptation differently, according to the buffet model.  Just as the staff who tend the buffet will keep it stocked with a variety of choices sufficient to satisfy every customer, the gene pool “maintains” abundant variation sufficient to meet any adaptive challenge (selection never has to wait for a new mutation).  Adaptation happens when the customer gets hungry and proceeds to select a platter of food from the abundance of available choices.

A bias in variation will operate completely differently in the two models.  Let us suppose that the buffet has 5 apple pies and 1 cherry pie. This quantitative bias in what is offered to the customer makes no difference.  A customer who prefers cherry pie will choose a slice of cherry pie every time; and likewise a customer who prefers apple will choose apple.  But at the sushi conveyor, the effect of a bias will be different.  Let us suppose that occasionally a dish of sashimi comes by on the conveyor, with a 5 to 1 ratio of salmon to tuna.  Even a customer who would prefer tuna in a side-by-side comparison may end up choosing salmon more frequently— a side-by-side comparison simply is not part of the process.

The reason I like this metaphor is that we can relate it directly to population genetics.  The sushi conveyor corresponds to the origin-fixation regime in which the chance of making a choice is directly correlated with the mutation rate, because each change depends on a new mutation.  By contrast, in the “gene pool” (buffet) regime, all the variants relevant to the outcome of evolution are present initially.  Using the model above, if we put the alternative genotypes aB and Ab into the starting population at just 0.5 % frequency, this kills the effect of mutation bias completely, as shown by the flat lines in the figure below.

When the variants relevant to the outcome of evolution are present in the initial population, biases in mutation don’t matter.

Why is selection so much more effectual in the buffet regime?  The probability of fixation of a new beneficial mutation is about 2s.  If sAb = 0.02 and saB = 0.01, this 2-fold difference in s corresponds to a 2-fold preference for Ab in the origin-fixation regime.  In the buffet regime, the impact of exactly the same fitness difference is far greater: if we have 2 alternative alleles already in a population and they have escaped the drift barrier, selection pretty much always establishes the more fit alternative.

Why is mutation bias so important in one regime but not in the other?  The bias operating so effectively in the origin-fixation regime is a bias in the introduction process, i.e., a bias in the rate of introduction of new alleles.   This kind of bias is a profoundly important effect that directly impacts the course of evolution.  But in the buffet regime, there is no bias in the introduction process, because there is no introduction process— all relevant alleles are present already.  Once the alleles are present, mutation can shift their relative proportions in a biased way, but such shifts are quantitatively trivial compared to the shifts caused by selection (or even drift).  This is why the architects of the Modern Synthesis said that mutation is a “weak force”.

Dual-causation imagined in a dual-yoke aircraft. In one version of dual causation, pilot #1 (selection) and pilot #2 (mutation) fight over the controls. The stronger one wins.  Mutation-biased adaptation does not make sense this way.  In a different kind of dual causation, pilot #1 moves the yoke fore and aft, controlling the nose of the plane, and pilot #2 turns the handle left or right, controlling the direction.

Let’s return to metaphors one last time.  For a century, we have understood that mutation and selection are both necessary.  But the role of mutation has been depicted as supplying raw materials or chance.  When seen as a force, mutation is said to be weak.  We have imagined selection in charge without any mutational effects, or mutation in charge when selection is absent (neutral evolution), but making them co-pilots is a new way of thinking (Figure).

How the Modern Synthesis got population genetics wrong

I promised earlier that I would explain why the architects of the Modern Synthesis were so committed to a view that they had not proved, and which led them to believe that the rate of evolution would not reflect the rate of mutation.

Evolution as movement in the topological interior of an allele-frequency space (left).  Each dimension is the frequency of allele 1 vs. allele 2 at a locus. The “shifting gene frequencies” view depicts evolution as a smooth shift from one set of optimal allele frequencies to another, involving many loci at once.   Change is initiated by an environmental change, and proceeds until a new optimum is reached.

First, let’s review the classic consensus on the genetic basis of evolutionary change.  Prior to the molecular revolution, the Modern Synthesis held that change consists of a smooth shift, in the interior of a gene-frequency space, from the previous optimal multi-locus distribution of allele frequencies to the new optimum.  The “shifting gene frequencies” consensus equated evolution with adaptation, and stressed that evolutionary change is

  • initiated by an environmental shift (which disrupts the current optimum)
  • driven by selection
  • fueled by available variation + recombination (“gene pool”)
  • not dependent on new mutations
  • multi-factorial, involving many loci each with small effects

Why were the architects of the Modern Synthesis so strongly committed to an elaborate view that they hadn’t proved?   Welcome to the world of science.  In science, there are theories, conceptual systems for generating explanations and predictions through formal and informal reasoning (e.g., the metaphors and analogies used above).  The predictions of theories like the Modern Synthesis are not always bottom-up predictions.  If some high-level proposition Y is accepted as true, a lower-level proposition X is implied if X is the only way to get to Y.   This is not the logical fallacy of affirming the consequent: if the theory asserts the truth of Y, and X is necessary for the truth of Y,  then X (and of all of the implications of X) are predictions of the theory.  We can put this in more flexible Bayesian terms to the effect that, if Y is more likely when X is true, then evidence for Y increases our belief in X.

In this case, what are X and Y?  The Modern Synthesis was committed to the buffered “gene pool” view (X), because this is the particular view of population genetics that justifies Darwinian doctrines of gradualism, the creativity of selection, the subordinate status of variation as a source of random raw materials, and the control of selection over the direction of evolution (Y).   Stripped of all nuance, the core claim of Darwinism is that, because organisms are exquisitely adapted, down to the finest detail, the mechanism of evolution must supply abundant infinitesimal raw materials for selection to shape the organism precisely to conditions.  The Modern Synthesis “gene pool” view provides the needed mechanism.  Alas, it’s mistaken.


Once again, I’ve said way too much, so let’s review the big picture.

For decades, a minority of evolutionists have been fascinated by the influence of biases in mutation in shaping genes, proteins and genomes.  For decades, mainstream scientists have been dissecting natural cases of adaptation, and also carrying out adaptation in the lab.  Those previously separate research directions come together in Galen, et al, and in some other studies such as Couce, et al., 2015 and Meyer, et al (see The Revolt of the Clay).  The results suggest that the textbook doctrine on the role of mutation in evolution is incorrect:

  • raw materials: mutation is not merely supplying raw materials because, in the case reported by Galen, et al, a single mutation changes affinity by 34 % (i.e., its not like 1 sand-grain among thousands)
  • mere chance: mutation is not merely a source of chance because, in this case, the bias in mutation makes evolution more predictable, not less predictable
  • weak force: evolution does not follow the logic of opposing forces where either selection or mutation must prevail, but instead allows simultaneous dual causation

For those who don’t care about theory, concepts, or history, the point of Galen, et al (and the other studies cited above) is that ignoring non-randomness in mutation means ignoring a potentially important source of information about what is likely to happen in evolution, and conversely, studying the rates of mutations gives us more leverage to predict and explain evolution.




#adaptation#evolution#mutation bias


Leave a Reply

Your email address will not be published. Required fields are marked *