Information

How would you model the evolution of two genotypes across generations?

How would you model the evolution of two genotypes across generations?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Say you have a genotype A that produces x offspring and another genotype B that produces y offspring, where x>y. These x offspring are of genotype A but with modest differences in fitness due to mutation and these y offspring are of genotype B but with modest differences in fitness due to mutation. How would you model how many offspring these x offspring can themselves produce and how many offspring these y offspring can themselves produce as a function of the quantity their original parents produced?

Obviously, you could say that each descendant of genotype A produces x and each descendant of genotype B produces y, but that would unrealistically benefit genotype A, since realistically, one of B's offspring could have a chance to be fitter than A's by developing a favorable mutation or one of A's offspring could have a chance to be less fit than B's by developing an unfavorable mutation.
You could say that each descendant of genotype A and B produce (x+y)/2, but this would not be fair to genotype A, since it is fitter, so its offspring would probably be fitter than B's offspring.

You could use the breeder's equation and say that each of A's offspring produce xh+m(1-h), where h is the heritability of fitness and m is the population mean, and that each of B's offspring produce yh+m(1-h). But this is also unfair to genotype B because all of its offspring are less fit than all of A's offspring, since x>y. So, what's a way of modeling how resultant mutations might occur, such that B has a chance of producing some offspring that are fitter than A's offspring?


So, what's a way of modeling how resultant mutations might occur, such that B has a chance of producing some offspring that are fitter than A's offspring?

Here's one possible solution that fits your criteria. You could have something like this (with "d" being the distance between "x" and "y", i.e. d=x-y and "n" being the n-th offspring, so it varies between 1 and x for A and 1 and y for B) :

A: n -> x + r(n), where r(n) is a random number between x-d and x+d
B: n -> y + r(n), where r(n) is a random number between y-d and y+d

(I write r(n) not because r is a function n, but because a different number is drawn for each n. I also didn't specify r's distribution, that would also need to be decided when making an actual model)

This is your situation where some offspring of B have a chance of being fitter than offspring of A, and the least-fit offspring of B can even be fitter than the fittest offspring of A. I think it fits your criteria and thus answers your question. The sum of the grandoffspring is basically the basic x or y squared (that you would get if each offspring had the same fitness as their parent) + the sum of all the "r"s.


Population Evolution

People did not understand the mechanisms of inheritance, or genetics, at the time Charles Darwin and Alfred Russel Wallace were developing their idea of natural selection. This lack of knowledge was a stumbling block to understanding many aspects of evolution. The predominant (and incorrect) genetic theory of the time, blending inheritance, made it difficult to understand how natural selection might operate. Darwin and Wallace were unaware of the Austrian monk Gregor Mendel's 1866 publication "Experiments in Plant Hybridization", which came out not long after Darwin's book, On the Origin of Species. Scholars rediscovered Mendel’s work in the early twentieth century at which time geneticists were rapidly coming to an understanding of the basics of inheritance. Initially, the newly discovered particulate nature of genes made it difficult for biologists to understand how gradual evolution could occur. However, over the next few decades scientists integrated genetics and evolution in what became known as the modern synthesis —the coherent understanding of the relationship between natural selection and genetics that took shape by the 1940s. Generally, this concept is generally accepted today. In short, the modern synthesis describes how evolutionary processes, such as natural selection, can affect a population’s genetic makeup, and, in turn, how this can result in the gradual evolution of populations and species. The theory also connects population change over time (microevolution), with the processes that gave rise to new species and higher taxonomic groups with widely divergent characters, called (macroevolution).


Population Genetics

Recall that a gene for a particular character may have several alleles, or variants, that code for different traits associated with that character. For example, in the ABO blood type system in humans, three alleles determine the particular blood-type protein on the surface of red blood cells. Each individual in a population of diploid organisms can only carry two alleles for a particular gene, but more than two may be present in the individuals that make up the population. Mendel followed alleles as they were inherited from parent to offspring. In the early twentieth century, biologists in a field of study known as population genetics began to study how selective forces change a population through changes in allele and genotypic frequencies.

The allele frequency (or gene frequency) is the rate at which a specific allele appears within a population. Until now we have discussed evolution as a change in the characteristics of a population of organisms, but behind that phenotypic change is genetic change. In population genetics, the term evolution is defined as a change in the frequency of an allele in a population. Using the ABO blood type system as an example, the frequency of one of the alleles, I A , is the number of copies of that allele divided by all the copies of the ABO gene in the population. For example, a study in Jordan 1 found a frequency of I A to be 26.1 percent. The I B and I 0 alleles made up 13.4 percent and 60.5 percent of the alleles respectively, and all of the frequencies added up to 100 percent. A change in this frequency over time would constitute evolution in the population.

The allele frequency within a given population can change depending on environmental factors therefore, certain alleles become more widespread than others during the process of natural selection. Natural selection can alter the population’s genetic makeup for example, if a given allele confers a phenotype that allows an individual to better survive or have more offspring. Because many of those offspring will also carry the beneficial allele, and often the corresponding phenotype, they will have more offspring of their own that also carry the allele, thus, perpetuating the cycle. Over time, the allele will spread throughout the population. Some alleles will quickly become fixed in this way, meaning that every individual of the population will carry the allele, while detrimental mutations may be swiftly eliminated if derived from a dominant allele from the gene pool. The gene pool is the sum of all the alleles in a population.

Sometimes, allele frequencies within a population change randomly with no advantage to the population over existing allele frequencies. This phenomenon is called genetic drift. Natural selection and genetic drift usually occur simultaneously in populations and are not isolated events. It is hard to determine which process dominates because it is often nearly impossible to determine the cause of change in allele frequencies at each occurrence. An event that initiates an allele frequency change in an isolated part of the population, which is not typical of the original population, is called the founder effect . Natural selection, random drift, and founder effects can lead to significant changes in the genome of a population.


Evolutionary prediction: “I do not think it means what you think it means”

When discussing evolutionary prediction, we must first settle on a shared vision of what it means to predict. Biologists often use the term predict rather loosely, conflating a variety of related but distinct ideas. The word derives from the Latin verb praedicere, which itself merges prae (prior or in advance), and dicere (to say). The Merriam-Webster dictionary defines predict as “to declare or indicate in advance.” This is most often used in terms of making statements about future events. But it is important to recognize that scientists can also make predictions about the as-yet-unmeasured outcome of a historical event (sometimes called retrodiction). For instance, knowing the rules of molecular evolution, we can use the DNA sequences for a set of related species to make a probabilistic prediction about the homologous DNA sequence in an as-yet-unsequenced related species. Or, knowing the genetic basis of evolutionary loss of armor in freshwater stickleback in many watersheds, we can predict with confidence that a particular sequence of the gene EDA will occur in a previously unstudied lake population of this fish.

For this article, we are specifically interested in the narrower use of prediction in terms of statements of future evolutionary events. Therefore, we will not be discussing the very large and well-established literature building and documenting biology's understanding of evolutionary history, except where it provides tools or data that illuminate future evolution. With this emphasis on predicting evolutionary future, it is helpful to briefly consider the discipline of future studies (Poli 2017). Researchers in this field distinguish between the terms forecast versus prediction. Forecasts are typically precise and quantitative and are often based on a theoretical model or perhaps time series analysis of historical data that can be projected into the future with quantitative estimates of confidence or error. Prediction, in contrast, is often used to describe qualitative and subjective statements, sometimes based on informed intuition. In keeping with common usage within evolutionary biology, we stick with the term prediction for most of this article as the overarching concept, of which forecasting is a quantitative subset.

One can make useful predictions at many different biological levels of organization and timescales, as well as with various degrees of precision. At the simplest level, we can confidently state that evolution (genetic change) will take place, whether because of natural selection or genetic drift (figure  1 a). Meta-analyses have confirmed that evolutionary forces are pervasive in natural populations, including natural selection (Caruso et al. 2017), sexual selection (Kingsolver et al. 2010), gene flow (Frankham 2015), nonrandom mating (Jiang et al. 2013), and genetic drift (Leinonen et al. 2007). Because evolution is ubiquitous, predicting its existence is trivial and does not provide actionable information.

Precision and scales of evolutionary forecasting. As was described in the text, there are varying degrees of precision and scale at which evolution may be forecast. (a) Nearly all traits and genes are subject to evolutionary change, making this the most reliable but least precise prediction. (b) In a constant environment, populations with sufficient genetic variation will evolve toward fitness peaks that increase mean fitness (Fisher's fundamental theorem of natural selection). We can therefore forecast that adaptation will occur even if we are uncertain of the specific traits or genes driving this adaptation. A more precise prediction would specify the traits (c) or genes (d) that will drive evolutionary change. (e) Even greater precision comes from forecasting the magnitude and direction of trait evolution (the black line) using quantitative methods such as the breeder's equation, which requires information on genetic and phenotypic covariances (G, P, represented by the grey oval) and selection strength (red dashed line). (f) Forecasting requires information on environmental settings, which may allow us to make predictions for numerous populations spanning a range of environmental settings: To what extent will these evolve in parallel or diverge? (g) Interacting species (species 1 blue and species 2 black lines, respectively) can drive each other's evolution through ecological interactions, such as character displacement between competitors, which requires community-level forecasting. (h) We may seek to forecast how evolutionary change by species within a community alter ecosystem properties, which can feed back to change interactions among and selection on those communities. The gray box that encompasses the top four panels is the focus of this article, the lower two panels (f) and (g) are not specifically addressed in the present article. Abbreviations: GP 𠄱 , heritability R, response to selection S, strength of selection.

A slightly more useful statement would be that, in a given constant environment, a population's mean fitness should increase over time, as is laid out in Fisher's fundamental theorem of natural selection (Fisher 1930, Shaw 2019). Furthermore, this model predicts that the rate of increase in mean fitness will be proportional to the genetic variance in fitness traits. In other words, populations that have greater genetic variation in traits associated with fitness reach mean fitness more rapidly by natural selection. The necessary assumptions are nontrivial. First, there must be genetic variation for traits affecting fitness (and therefore genetic variation in fitness). Second, selection must be strong enough to overwhelm genetic drift (equivalently, population sizes must be large enough that genetic drift is weak). Third, environmental change and density-dependent competition can both reduce mean fitness faster than adaptation can occur, by changing the phenotypic optimum of the fitness landscape faster than the population can approach (McGill and Brown 2007). Predicting that adaptation will occur is preferable to the generic statement that evolution will take place, because it specifies a metric (mean fitness), direction (increasing mean fitness to an adaptive landscape peak), and a rate (figure  1 b). However, it lacks mechanistic detail and so has relatively little utility for applied problems. Note also that mean fitness can increase through nonevolutionary means, via adaptive phenotypic plasticity, matching habitat choice, or niche construction (Edelaar and Bolnick 2019), or by environmental change that increases reproductive success for all individuals (e.g., increased resource availability).

A more useful and interesting goal is to predict which of the vast array of traits are likely to evolve in response to a particular selective challenge (e.g., an environmental change figure  1 c). Not all traits will evolve: Many may be at equilibria (e.g., subject to stabilizing selection), lack genetic variation, or are neutral so their evolution is too slow to be relevant to the timescale in question (Kumar and Subramanian 2002). But, typically, at least some genes and traits are likely to be evolving at any point in time the question is merely which ones (figure  1 d). Typically, evolutionary biologists first seek to specify which phenotypic traits are evolving (and how they affect fitness), then turn to the question of which genes underlie this trait, a topic to which we will return later.

If we can identify traits that will evolve over the relevant timescale, we might then aspire to an even greater degree of predictive power: in what direction the traits will change (figure  1 e). Predicting directionality should generally be within our reach because it simply requires knowledge of the sign of the slope of the selection gradients acting on the population means of the relevant traits, but see below for issues associated with predicting multivariate phenotypes. Better still, can we make a quantitative forecast? By what amount (e.g., fold change, standard deviations, proportion) will the evolving traits change over a specified unit of time? What is our uncertainty around this forecast? Such quantitative forecasting is an achievable end goal for quantitative geneticists, given information about heritability and selection acting on one or more traits over a short timescale, which may be plugged into the multivariate breeder's equation (see the glossary in box 1). A related goal is forecasting changes in the full trait distribution (e.g., variance, kurtosis, covariance), but this is a harder problem. There is no simple equation, akin to the breeder's equation, for forecasting the evolution of genetic covariances and higher moments. Phenotypic plasticity poses a substantial challenge for quantitative forecasts of trait change. Plasticity is the ability of a given genotype to produce multiple alternative phenotypes, depending on the environment. As a result, phenotypic distributions can therefore change without any evolution, or plasticity can amplify or obscure heritable changes in traits. We lack methods to forecast plastic trait changes in as-yet-unobserved environmental conditions.

Box 1. Glossary.

Adaptive evolution. In the present article, we use adaptive evolution as a subset of evolutionary outcomes that are driven by natural selection (as opposed to random evolutionary change e.g., genetic drift), which leads to increased mean fitness through time.

Allelic segregation. Segregation of alleles occurs during gamete formation because of meiotic cell division. This and Mendel's principle of independent assortment explain why each gamete produced by a diploid organism is unique in terms of its genetic complement.

Breeder's equation. The breeder's equation R = S × h 2 describes the response to selection (R) as a function of the strength of selection (S) and the heritability of a phenotypic trait (h 2 ). It is used to forecast the evolutionary response of complex multigenic traits to either natural or artificial selection. Multivariate approaches incorporate the response of multiple traits that are genetically correlated.

Density dependence. The population ecology processes in which a population's growth rates are regulated by its density. For example, as a population gets larger, resource limitations result in slower population growth or even decline.

Coevolution. The process by which reciprocal selection drives evolutionary change in two or more partner species. A common example of this would be arms-race coevolution in which defensive adaptations, such as increased toxicity drive, increased toxin resistance in a predator. In some cases, this is also used to describe cospeciation in which pairs or groups of lineages diverge together𠅏or example, feather mites and birds.

EDA. The EDA gene encodes a transmembrane signaling protein, Ectodysplasin, involved in early development. This gene is present throughout animal lineages and regulates the interaction between ectoderm and mesoderm.

Effective population size (Ne). A measure of the potential for genetic drift to change allele frequencies due to unbalanced (not 50:50) sex ratios, temporal variation in population size, overlapping generations, and other real-world properties of populations.

Epigenetic. Epigenetic effects are changes in gene function that can be inherited but are not caused by changes in DNA sequence (i.e., mutations). Instead these typically come from modification of maternal or paternal DNA molecules such as DNA methylation that change gene expression.

Epistatic interactions. Epistatic interactions describe phenotypic effects that result from nonadditive interactions between alleles at different loci or between mutations within a single gene. Nonadditive effects imply that the phenotypic effects of alleles at one gene are changed by the genotype at another gene or between different alleles in a single gene. These interactions can generate genetically based phenotypic variation that is not passed from one generation to another because alleles at different loci segregate independently, disrupting epistatic interactions.

Gene flow. Gene flow describes the movement of alleles between populations of a species. In cases in which populations are fully reproductively isolated from each other gene flow is zero, and any gene flow via hybridization is called introgression. Gene flow tends to homogenize otherwise diverging populations, although it may also provide genetic variation on which selection can act.

Genetic architecture. The term genetic architecture describes the underlying genetic basis of traits (number of loci, their effect sizes, recombination rates, epistasis, dominance), as well as the variation within or among populations.

Genetic covariances. In evolutionary quantitative genetics, the genetic covariance is a measurable summary statistic, capturing the effects of pleiotropy and link disequilibrium in generating correlated values of two or more inherited traits. The strength of the covariances determine the extent to which selection on one trait drives evolutionary change in another trait (from Agrawal and Stinchcombe 2009)

Genetic drift. Genetic drift describes an evolutionary process in which sampling error generates changes in allele frequencies across generations. Unlike changes in allele frequencies associated with natural selection changes in allele frequencies associated with genetic drift are random.

Genotype. The genetic makeup of an organism or individual.

Genotype𠄾nvironment (Gൾ) interactions. The differential response to environmental variation by different genotypes. These interactions can reflect that fitness of a particular genotype is dependent on its environment and that the relative fitness of two or more genotypes can change depending on the environment.

Heritability. Heritability describes the proportion of phenotypic variation in a character or trait that results from genetic variation. In its broadest sense it can be characterized by the slope of the line that describes the relationship between mean parental trait values and mean offspring trait values.

Homologous. In the present article, we use the term homologous to describe genes that share functional and structural similarities because of common ancestry. A homologous protein is one that is structurally and functionally similar in different lineages because of common ancestry.

Indel. An indel is an insertion or deletion mutation in a region of DNA. This can consist of single nucleotide insertions or deletions or the insertion (or deletion) of multiple nucleotides.

Link disequilibrium. Nonrandom association of alleles at different genes within a population because of reduced recombination generated by physical proximity on a chromosome.

Mean fitness. Mean fitness describes the average fitness of all individuals in a population. In the present article, we use individual fitness in its classic Darwinian definition for example, an individual's (or a genotype's) fitness is directly proportional to its reproductive contribution to breeding individuals of the next generation.

Metapopulations. An ecological concept that incorporates real world spatial and temporal structures of species and populations. The concept predicts that most species consist of subpopulations in which individuals can freely interbreed connected into larger metapopulations in which movement of individuals (and alleles) is possibly but more limited. The metapopulation model also explicitly incorporates spatial and temporal environmental variation to understand the ecological and evolutionary trajectories of species.

Multivariate phenotypes. A multivariate phenotype approach (as opposed to univariate phenotype) incorporates more than one trait in analyses that assess the genetic basis of complex characters (such as disease or pathogen susceptibility). This approach incorporates multiple individual characters to enhance the power of the analysis.

Parallel evolution. The tendency for two or more replicate populations to evolve similar adaptive solutions (e.g., same gene, or morphology) to a shared environmental challenge. Parallel evolution is widely considered to be diagnostic evidence that evolution can be predictable.

Paralogs. Two or more similar genes, coexisting within the same species's genome, that are the result of gene duplication events

Phenotype. The phenotype of an organism or individual describes the sum total of all observable traits or characteristics. It can include (but is not limited to), morphology, physiology, and behavioral characteristics.

Pleiotropy. When polymorphism at a single gene gives rise to correlated variation in two or more distinct phenotypic traits

Population genetics. A field of genetics that deals with mathematical description of the change in allele frequencies within and between populations.

Price equation. In the present article, we use the Price equation in its original form to represent the change in a trait or allele frequencies across two generations. The Price equation (Price 1972) incorporates the covariance between fitness and traits (or allele) to provide a quantitative description of the change in trait values across a single generation.

Quantitative genetics. A field of genetics that deals with the evolution of complex traits that typically result from the interaction of multiple genes and the environment. This approach allows the quantification, measurement, and prediction of the change in trait means of populations, even when the specific genetic basis of those traits is unknown.

Recombination. During recombination alleles from maternal and paternal chromosomes are swapped resulting in novel combinations of alleles in offspring chromosomes. This process does not generate new mutations but generates novel genotypes in the offspring generation.

Selection differentials and selection gradients. Selection gradients and selection differentials both provide estimates of a trait's relationship with fitness. However, they are quantitatively different. Selection gradients are univariate estimates produced by linear regression (i.e., the slope of the line describing the relationship between trait values and fitness), whereas selection differentials are multivariate and represent the partial regression coefficients of the slope describing the relationship of a trait's contribution to fitness.

SNP. A single nucleotide polymorphism or a mutation of a single position in a DNA molecule that is variable within or among populations.

SPL transcription factors. Squamosa promoter binding-like (SPL) proteins are plant specific transcription factors with a regulatory function in multiple biological processes. Transcription factors are important in regulating the rate of gene expression.

Stabilizing selection. Stabilizing selection is a form of natural selection in which phenotypic extremes are selected against and phenotypes closer to the mean in a population have higher fitness. In theory, this form of selection should reduce the variance for a trait in a population without shifting the mean of the trait. A classic example of this is birth weight for human children in which low birth weight reduces the survival rate of the child and high birth weight reduces the survival of the mother.

Standing genetic variation. The standing genetic variation describes the current sum total of genetic variation within and among populations.

Transitions and transversions. These terms differentiate between types of mutations in DNA molecules. Transitions are DNA substitutions in which a purine base (A or G) or a pyrimidine base (C or T) is exchanged (e.g., G replaces A or C replaces T) in a DNA molecule. Transversions are the replacement of a purine (or pyrimidine) base with its “opposite.” For example, replacement of A with T or G with C.

The quantitative genetic approach often treats genetic, molecular, cellular, and developmental mechanisms as a black box, focusing on emergent and readily observable traits (e.g., size, shape, or behavior). A more complex goal is to predict evolution and action of finer-scale mechanistic traits (which we might call upstream traits) that ultimately generate the traits of interest at the organismal level. Examples might include timing and levels of gene expression, pathway activity, enzymatic activity or concentrations, developmental patterning, and so on. We could also study these upstream traits to attempt to predict which ones will evolve, in what direction, and by how much (figure  1 c and  1 e, respectively). This approach lets us predict not just evolution of the obvious traits, but could also provide a mechanistic explanation of how these trait changes are actuated by changes in gene expression, development, environment, and so on. Ultimately, all the phenotypic traits we might choose to study arise from changes in the expression of genes (where, when, how much), their translation (speed, timing, splicing), and subsequent protein function (folding, active site properties, dynamics, transport, degradation, interactions). These all have their roots in the sequence, packaging, and epigenetic modification of DNA. Therefore, many biologists feel that the ultimate question of evolutionary prediction is to anticipate the precise genetic changes underlying evolution. We can define distinct levels of predictive precision within this ultimate question of genetic forecasting (box 2).

Box 2. Distinct levels of predictive precision in molecular evolution.

Evolution will occur in a particular group of genes (e.g., gene ontology category, pathway, family of paralogs).

Evolution will occur in a particular gene.

Evolution of that gene will entail changes in particular motifs or properties of a protein (e.g., a shift in polarity or shape, or within a particular active site).

Evolution will entail changes in frequency of particular genetic variants (e.g., single nucleotide polymorphisms [SNPs], indels, gene copy number, chromosomal rearrangements). Precise evolutionary forecasting might go so far as to predict the direction, magnitude, and speed of allele frequency change, ideally with appropriate confidence intervals.

Predicting evolution of single gene is insufficient because evolution is rarely a single gene process. For instance, initial adaptive changes might impose costs that require compensatory mutations after. Therefore, for true predictive power, we should aspire to scale up the goals in figure 1 a– 1 e to multiple genes, how they interact, and—ultimately—the whole genomic shebang (many genes, architecture, and epigenetics).

The preceding kinds of evolutionary predictions are all concerned with evolution that is occurring within a particular focal population (changes in trait distributions and genotype frequencies in a defined group of individuals figure  1 a– 1 d). However, evolution is more complex in that it occurs among populations connected through networks of gene flow and spatial variability (e.g., metapopulations). Prediction at the level of the species range might include the specific traits that will evolve in individual populations, leading to population divergence, and the role of gene flow among populations in constraining this divergence (figure  1 f). From an ecological standpoint, this level of prediction would also include the establishment or extinction probability of individual populations.

Although the preceding points concern evolution within a focal species, ecological interactions between species (e.g., competition, predation, parasitism, mutualism) can drive simultaneous coevolution in two or more species (Thompson 1989). Each species is subject to selection to increase the benefits, or mitigate costs, of their interaction (figure  1 g, species 1 and 2 as blue and black lines, respectively). The resulting evolution within each species changes the nature of their interspecific interactions, which, in turn, changes the selection that their partners or antagonists experience (so-called eco𠄾vo feedback loops, Genung et al. 2011, Post and Palkovacs 2009). Therefore, evolutionary forecasts may need to account for coevolutionary dynamics and therefore consider multiple species concurrently.

Moving to a still larger scale, we could instead focus on predictions about emergent community and ecosystem properties rather than a particular species. For example, we can confidently predict that in any biological community, given enough time there will emerge guilds of primary producers, consumers, and predators. There will be communities evolving to certain kinds of body size distributions, rates of energy conversion, and abundance distributions. These higher-level predictions are easiest to make over very long timescales when environments remain stable. Our goal in the present article is to focus on the precision and scales of evolutionary forecasting that encompass the points made in figure  1 a– 1 e and not to address prediction at the level of figure  1 f and  1 g.

To summarize, we frequently use predict evolution as a shorthand that encompasses a wide range of goals with varying degrees of precision, qualitative or quantitative, applied to various scales of organization (e.g., genes, genomes, phenotypes, performance or fitness, species, communities) because of a range of mechanisms (e.g., selection, genetic drift, gene flow, genetic architecture, species interactions). Beyond defining what we mean by prediction, it is equally crucial that we clearly specify the timescale over which our prediction applies. Predictions for some of these combinations seem well within our reach at present, others seem like moonshots that may require a heroic effort employing all our current theory and technologies, or some may be fundamentally impossible.

Although evolutionary history is well understood, and evolutionary theory provides a powerful and well-validated means of understanding that history, our ability to make long-term quantitative forecasts of future evolution remains beyond our reach. Is that simply because we lack sufficient information at present? We believe it is important that we distinguish between two distinct views: H1 is that evolution is fundamentally unpredictable, not because we lack sufficient knowledge but because it is truly too stochastic for forecasts at any useful degree of precision. H2 is that evolution is predictable, if we simply had the right models and sufficient data to make effective forecasts.

H1. Evolution is not predictable, no matter how much we measure

Stephen Jay Gould famously argued in Wonderful Life: The Burgess Shale and the Nature of History (1989) that evolution would not repeat itself if we rewound the tape of life and replayed it from the Cambrian, we would be unlikely to end up with anything like humans. In this spirit (and on a shorter time span), we posit that evolution is inherently unpredictable at the molecular and population level. This is because of the unpredictability of many factors scaling from molecular to environmental mechanisms.

Ultimately, evolution is dependent on the random process of genetic mutation. Although probabilistic aspects of mutation are quantifiable and therefore somewhat predictable (e.g., rates, variation in transitions versus transversion, mutational hotspots within the genome), in the near term, we cannot predict exactly which mutations will occur, where, or when. Even if we know the genes and genetic pathways that should be important under a specific selective pressure, reliance on de novo genetic mutation (e.g., mutation-limited evolution) makes it difficult to forecast the specific genetic changes enabling future adaptation. The counterargument (detailed later in this article) is that in large populations all possible mutations will occur with some regularity.

Conversely, selection on standing genetic variation may be easier to forecast. During meiosis in sexual organisms, recombination adds an additional element of stochastic genetic variation, creating new combinations of linked alleles, as well as mutations and chromosomal rearrangements on which selection can act. As with mutation, recombination hotspots and cold spots mean that crossing over events are not equally probable across the genome.

Genetic drift then adds an element of random changes in the frequency of existing alleles. Which adults succeed in reproducing and the number of surviving offspring are somewhat random. Even an individual carrying a beneficial mutation that confers higher expected fitness may fail to find a suitable habitat, or may be killed by a pathogen or predator, leading to the loss of their beneficial mutation. For individuals who do succeed in reproducing, allelic segregation during meiosis means that the resulting offspring will carry a random sample of their parents’ alleles, changing allele frequencies in finite populations. The net effect of these stochastic processes is a modest random change in allele frequencies.

Population genetics has a robust set of model-based descriptions of these stochastic processes. Given data on the distribution of mutation rates, recombination rates, and effective population size, we can forecast a probability distribution of rates and magnitudes of allele frequency change over a specified time. These forecasts can account for mutational and recombination hotspots, across the genome. But the realization of this probabilistic process (e.g., the particular mutations) is fundamentally unpredictable in the immediate future. Over the longer term, there are enough opportunities for new mutations and drift so that, through the law of large numbers, the probabilistic predictions become more useful. This is an important example of how the particulars of evolution may be in a sense more predictable in the longer term, contrary to the usual assumption that short-term evolution is easier to forecast.

The inherent stochasticity of mutation and genetic drift is compounded by epistatic interactions within and among genes. The phenotypic and fitness effect of a given mutation depends on the carrier's genotype at other loci. Therefore, the order in which substitutions occur has a dramatic impact on both the magnitude and sign of their phenotypic and fitness effects and probability of fixation (Costanzo et al. 2010). The inherent randomness of the outcomes of the mutational process means that unpredictable early substitutions impede our ability to predict the fitness effects of later substitutions. This mutation-order effect changes the fitness effects of substitutions, so it may prove inherently impossible (as opposed to impractical) to precisely forecast long-term evolution of epistatic gene networks (Sailer and Harms 2017).

There are also barriers to effective evolutionary prediction that arise from a fundamentally indeterministic aspect of biology, rather than stochasticity. Many-to-one mapping describes the idea that there are many solutions (genotypes, phenotypes) that yield equivalent functional outcomes. Consequently, natural selection could favor any of a number of solutions to a particular adaptive challenge. For many phenotypic traits, there exist numerous combinations of morphological structures that can yield identical functional effects (Wainwright et al. 2005). For example, the four-bar linkage lever system of labrid fishes’ jaws serves to translate force into motion, generating a lever mechanical advantage determining the capacity to produce forceful (crushing) bites versus fast movement useful for evasive prey. Because of the structural complexity of the four-bar lever system, there exist many morphological solutions (head shapes) with identical functional effects (e.g., precisely the same mechanical advantage coefficient). Assuming selection acts on function (e.g., the ability to generate forceful or fast jaw opening), then the evolution of underlying skeletal morphology is unpredictable in the sense that many skeletal shapes yield identical function (any one of which might evolve), although still other skeletal shapes are selected against (Alfaro et al. 2004).

The environments in which some organisms or populations exist may prove so variable or unstable that a consistent model (or prediction) of environment and genotype by environment (Gൾ) interactions may not be possible as timescales increase. For instance, chaotic dynamics in ecological communities suggest that there are fundamentally unpredictable changes in conditions (as opposed to our theories being incomplete Hastings et al. 1993). However, timescale matters. Over short to medium timescales (years to decades), chaotic dynamics mean that we have no capacity to predict future environmental conditions that could impose selection on our focal organisms (dependent on population dynamics). Over very long timescales (centuries to millennia), chaotic systems can remain within stable attractors (lacking global catastrophic events), defining a field within which conditions are bounded. Likewise, stochastic processes such as weather might be unpredictable over short timescales (days to weeks) but follow predictable long-term trends (e.g., global warming over the coming centuries, or even cyclic dynamics such as Milankovitch cycles). Considering these timescales and relaxing the need to know the exact species, and given predictable long-term trends and the conserved structure of ecological guilds, we should be able to make some predictions (e.g., at a minimum, we could predict there will be herbivores, carnivores, dominant strategies). However, these predictions will not help us resolve many of the questions and applications of forecasting evolution we highlighted earlier.

H2. Evolution is predictable we just don't know how yet (models and data)

Our poor performance at correctly forecasting evolutionary outcomes (spanning molecular through organismal to ecosystem levels) may result from incomplete models of evolutionary and biological processes, or insufficient data to parameterize such models. Note that there already exists a large literature of evolutionary theory that provides key building blocks of such a model, drawing on both population genetics and quantitative genetics (e.g., the breeder's equation Walsh and Lynch 2018). This literature has led to useful computational tools such as SLiM (Haller and Messer 2019), that can carry out whole-genome forward-in-time, spatially explicit population genetic simulations with many of the evolutionary processes we might wish to incorporate: recombination, mutation, selection, and migration. However, even this powerful new tool excludes other processes that we know shape the direction of evolution, including epistasis, genotype-to-phenotype mapping, plasticity, species interactions, population dynamics, and others detailed below. Then, when we have a satisfactory evolutionary forecast model in hand, to apply this model to any real biological system would require extensive, perhaps prohibitive empirical data to parameterize the model and generate the desired forecasts. Therefore, to forecast evolution as defined earlier (figure  1 and box 2), we need both conceptual progress and data, which we detail below.


Where Do Cellular Innovations Map onto the Tree of Life?

A first step in nearly all studies in evolutionary biology is the elucidation of phylogenetic patterns of variation. Although a purely historical perspective cannot reveal the mechanisms by which evolution proceeds, it does clarify what needs to be explained. Traditional cell biology is largely devoid of comprehensive comparative analyses, but recent studies demonstrate the power of such approaches, as illustrated by the following three examples.

The first example addresses the evolutionary origins of the network of organelles and underlying molecular features by which membrane trafficking emerged in eukaryotes. The sorting of proteins and lipids among the intracellular compartments of eukaryotic cells is mediated in part by a family of protein complexes called adaptins. Although it had been accepted for over a decade that there are only four adaptin complexes in eukaryotes, comparative genomics suggested the presence of a fifth highly divergent adaptin-like complex across eukaryotes (53). Subsequent characterization of the protein in human cells identified its cellular location and function, thereby fundamentally altering our basic understanding of vesicle-transport systems and the likely order of evolutionary events leading to their diversification. An even more recent phylogenetic analysis suggests the existence of a sixth form of adaptor complex (54), raising the possibility that still more remain to be discovered, perhaps with some complexes being restricted to a subset of taxa.

A second striking example of the power of comparative analysis to inform our basic understanding of cell biology involves the discovery of an evolutionary relationship between what were considered two very different kinds of membrane-deformation proteins. Cargo transport in eukaryotic cells involves the use of diverse pathways initiating with membrane-coated vesicles supported by clathrin, and the cage forming proteins of cytoplasmic coat protein complexes I and II (COPI and COPII). Although these proteins are lacking in amino acid sequence similarity, comparative structural analysis suggests a common molecular architecture that is also related to the membrane-curving proteins involved in both the nuclear-pore complex (NPC) (55) and the adaptins discussed above. The structural and functional insights emerging from these observations guided the development of a mechanistic understanding of the NPC (56) and yielded a novel evolutionary proposal—the “protocoatomer” hypothesis, which postulates that many vesicle-coating complexes and the NPC arose by descent with modification (55). Among other things, this concept has provided a potential explanation for how the diverse body plans of eukaryotic cells could have arisen from a simpler prokaryote-like ancestor.

In a third example, an integration of molecular and morphological phylogenetic analysis has led to the identification of novel components of centrioles and cilia, as well as to evolutionary hypotheses for how their coordinated biogenesis and functions in different cellular contexts have been achieved through duplication and divergence of an ancestral gene set (57, 58).

This small set of examples illustrates the considerable potential for more elaborate comparative analyses to elucidate the evolutionary foundations of the most basic eukaryotic cellular features. Of course, ascertainment of where cell-biological innovations map onto the Tree of Life and inference of phylogenetic points of gain and loss of various modifications will require a substantial increase in taxonomic sampling of cellular diversity. Of the estimated 5–100 million extant species, only ∼1.5 million have been described at even a rudimentary level, and most of these taxa are heavily biased toward plants, animals, fungi, and microbes with direct human impact (59) (Fig. 1). Future studies of biodiversity are likely to continue to extend to the discovery of novel phyla for quite some time (e.g., refs. 60 ⇓ –62). These issues, together with the fact that typically about a third of predicted protein-coding genes per sequenced genome are undefined and/or restricted to narrow taxonomic groupings, make clear that we are still missing immense swaths of information on cellular diversity. This “missing phylogeny” is likely of high value to applied research efforts in medicine, agriculture, and environmental science.

Taxonomic distribution of research articles and sequenced genomes. Modern taxonomy identifies five major eukaryotic supergroups: the Excavates (turquoise), Chromalveolates (orange), Archaeplastida (green), Amoebozoa (purple), and Opisthokonts (red). Although the total number of species on earth remains unknown, it is clear that there are far more unicellular eukaryotes than the combined total of all animals (Metazoa, an Opisthokont lineage), fungi (also Opisthokonts), and plants (Archaeplastida). However, research activity displays considerable taxonomic bias. As of January 2014, the National Center for Biotechnology Information taxonomy browser (www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi) lists 338 Archaeal genomes (dark gray), 20,709 Eubacteria (light gray), 769 Metazoa, 1,201 Fungi, 251 green plants/algae, and 336 genomes from all other eukaryotic taxa (13% of eukaryotic genomes). The taxonomic distribution of PubMed citations is as follows: Archaea, 19,000 Eubacteria, 397,000 Metazoa, 576,000 Fungi, 135,000 green plants/algae, 168,000 and all other eukaryotes combined, 97,000 (<9% of publications on Eukaryotes).

Unfortunately, parts lists inferred from genome information alone can take us only so far. Although results from transcriptomics, metabolomics, etc. can provide additional information, such work must ultimately be coupled to detailed studies of individual gene products in diverse taxa. To this end, we envision the need for a new grand challenge in biology, such as the proposed Atlas of the Biology of Cells (www.nsf.gov/publications/pub_summ.jsp?ods_key=bio12009). The fundamental idea here is to develop a database for cellular/subcellular features for a judiciously chosen, phylogenetically broad set of organisms, with the goal of sampling the functional diversity of metabolic and cellular morphological traits in the fullest possible sense. To be maximally productive, such an enterprise will require the further development of automated, generalizable, and high-throughput cell-biological methods. Significant support for appropriate phylogenetic sampling, development of reliable culture methods, and standardized measurement methodology will also be necessary. Most importantly, the latter will require the establishment of not only controlled vocabularies and ontologies to provide a conceptual framework for data comparison, but also quantitative metrics for defining, comparing, and predicting cell-biological structures and processes.

The payoffs of such an organized research program are likely to be substantial. As an analogy to where evolutionary cell biology is and where it might lead, consider that whole-genome sequencing was barely a dream 25 y ago but, in the past decade, has revolutionized virtually every aspect of biology, vastly increasing our understanding of human-genetic disorders, methods for disease control, energy production, and ecosystem function. Such advances continue to inspire the development of new ’omics technologies with enormous increases in accuracy and efficiency, as well as the emergence of novel computational technologies for storage, integration, and analysis that facilitate the rapid transformation of data into knowledge.


Potential Genetic Drift (Small Population Size) Questions

Given the small size of our population, would you expect one of the alleles to go extinct given enough time? Why or why not?

Yes. Given enough time, alleles will go extinct in small to very large populations as a result of genetic drift. However, smaller populations can experience changes in allele frequency at high rates. This could lead to a rapid extinction of one allele and the fixation of another.

How would the amount of time expected for an allele to become fixed change if the population were larger? smaller?

Larger = longer smaller = sooner.

Sexual Selection via Assortative Mating. Under this scenario, the first step in the mating process (“students randomly choose a mate”) is modified. Students randomly choose a mate, but only if he or she has the same phenotype essentially, students mate with the closest individual with the same phenotype. The rest of the steps are unaltered. Conduct three to five rounds of mating. The resulting genotype frequency distributions should show a decrease in heterozygotes, whereas both homozygous genotype frequencies increase (Figure 1). DO NOT CARRY THIS SCENARIO FORWARD INTO THE SUBSEQUENT PARTS OF THE ACTIVITY.


How would you model the evolution of two genotypes across generations? - Biology

The use of the term ‘genetic redundancy’ has changed over the past 100 years.

The amount of redundancy in the mapping of genotype to phenotype is a critical parameter for evolutionary outcomes under a range of models.

Distinguishing the conceptual terms ‘genotypic redundancy’ and ‘segregating redundancy’ promotes clearer language to further the study of redundancy at all levels of evolutionary biology.

Empirical determination of redundancy is challenging, but there are approaches which allow for the quantitative inference of redundancy.

The C-score measure of evolutionary repeatability is correlated with two different definitions of redundancy and can be applied to empirical datasets.

Genetic redundancy has been defined in many different ways at different levels of biological organization. Here, we briefly review the general concept of redundancy and focus on the evolutionary importance of redundancy in terms of the number of genotypes that give rise to the same phenotype. We discuss the challenges in determining redundancy empirically, with published experimental examples, and demonstrate the use of the C-score metric to quantify redundancy in evolution studies. We contrast the implicit assumptions of redundancy in quantitative versus population genetic models, show how this contributes to signatures of allele frequency shifts, and highlight how the rapid accumulation of genome-wide association data provides an avenue for further understanding the prevalence and role of redundancy in evolution.


Results

Microbial communities and their underlying metabolic interactions reflect the ecological and evolutionary histories of the component species [51]. To capture these interactions, we combine stoichiometric metabolic models with ecological and evolutionary dynamics in the multi-layered evoFBA framework (see Methods). To test the utility of this framework, we apply it to the LTEE in which E. coli populations evolve in a defined glucose-limited environment [2, 52].

To model the LTEE, we ran evoFBA simulations starting with a metabolic model of E. coli that accounts for 14 carbon sources including glucose and byproducts that can be scavenged from the environment to produce biomass and fuel associated core metabolic reactions. In each evoFBA simulation, we allowed the metabolic model to change by random mutations under global constraints that must be obeyed. Thus, each simulation produced mutant model organisms exhibiting different uptake rates, metabolic flux patterns, and resulting growth rates.

EvoFBA predicts evolution of cross-feeding between lineages with different metabolic flux distributions

Starting from a population of identical model organisms under conditions similar to the LTEE, a typical evoFBA simulation produced through random mutations more than 90,000 genetically distinct model organisms over 550 simulated daily transfer cycles (Fig. 1). The evolutionary dynamics across replicate simulations were highly reproducible in their key features, in particular the diversification of the population into two coexisting lineages (Fig. 2). Thus, throughout the paper, we will focus on results from a typical representative simulation that resulted in 97,912 different model genotypes, of which 3943 survived at least one transfer event (Fig. 1a) and 12 reached a population size of at least 10 5 cells at some point (Fig. 1b). These simulations revealed specific changes in oxygen, glucose, and acetate uptake by the model organisms (Fig. 1b). Glucose uptake and incomplete oxidation resulted in acetate secretion by the ancestral model organism, which would then switch to acetate uptake and oxidation after the glucose was exhausted. Thus, the ancestral model displayed a diauxic shift (Fig. 3a), as observed in E. coli [21]. As the in silico evolution proceeded, new model organisms arose that had increased glucose uptake and acetate production. The resulting increase in acetate concentration generated an ecological niche that was colonized by other model organisms with increased acetate uptake but reduced glucose uptake. After

300 simulated daily transfer cycles (

2000 generations), the simulated evolution came to a halt, with no mutant model organisms able to replace the dominant ones. Thus, the in silico dynamics produced two distinct lineages that specialized on glucose and acetate, respectively. The glucose-specialist model organisms lost the ability to consume acetate, whereas the acetate-specialist model organisms retained the ability to consume glucose but at a lower rate, and the timing of their diauxic shift was changed (Fig. 3a). As a consequence, the simulation led to a stable cross-feeding relationship between two lineages of model organisms.

Evolutionary dynamics in silico. a Numbers of surviving cells (i.e., post dilution) after each simulated cycle on a logarithmic scale. Each curve shows one of the 3943 model organism genotypes that survived at least one cycle (see text). b Relationships among ancestral and mutant model genotypes for those that reached a population of at least 10 5 cells at any point during the simulation (see Methods). Model ID indicates the identifier assigned to each model genotype, with 1 being the ancestor. Line thickness is proportional to the log10-transformed number per 10-ml volume at the start of each cycle. Coloured bars show relative uptake rates for glucose (blue), acetate (red), and oxygen (green)

Replicate runs of evoFBA. a One of five replicate simulations using the same parameter set as described in the main text and shown in Fig. 1. All simulations led to qualitatively similar outcomes. b Running evoFBA simulations with a smaller maximum mutation step size (+/ −1 mmol/gDW/h), see Methods eq. 5), led to the same diversification into glucose specialist and glucose-acetate co-utilizing model organisms, although the time required to achieve the diversification was substantially longer. Model ID, line thickness and coloured bars are the same as in Fig. 1

Simulated and experimental dynamics of population density and substrate concentrations. a Simulated dynamics over a 24-h transfer cycle for the evolved acetate specialist (left, ID: 44490), ancestral (middle, ID: 1), and evolved glucose specialist (right, ID: 12364) model organisms. Model IDs are the same as in Fig. 1b. b Experimental data for the 6.5KS1 (left), ancestral (middle), and 6.5KL4 (right) clones from the LTEE. Biological experiments were performed at a 10-fold higher concentration of glucose than the simulations to increase cell density and thereby improve the accuracy of the measurements of cell growth and concentrations of residual glucose and secreted acetate. Biological data are means of three replicate cultures and error bars show standard deviations

We then examined the metabolic fluxes for the two model organisms when growing on glucose and acetate (Fig. 4). On glucose, both the glucose and acetate specialists displayed similar behaviours, using the TCA cycle only partially and the glyoxylate shunt not at all (Fig. 4a and c). After switching to acetate consumption (which the glucose specialists could not do), the acetate specialists showed very different fluxes, with reverse glycolysis and full use of the TCA cycle including the glyoxylate shunt (Fig. 4b and d). We emphasize that the emergence of cross-feeding model organisms and their associated fluxes in the evoFBA simulation represents an idealized evolutionary stable state given the assumptions of the evoFBA framework.

Metabolite turnover fluxes in glycolysis and TCA cycle. Fluxes in the glucose specialist (a, b) and the acetate specialist (c, d) genotypes (model IDs 12364 and 44490, respectively) during growth on glucose (a, c) and acetate (b, d). The following metabolites and reactions are shown: ac, acetate actp, acetyl-phosphate akg, alpha-keto-glutarate cit, citrate f6p, fructose-6-phosphate fum, fumarate glx, glyoxylate g6p, glucose-6-phosphate icit, isocitrate mal, malate oaa, oxaloacetate pep, phospho-enol-pyruvate succ, succinate succoa, succinyl-coenzyme a. PGI, ACN, ACE, and ACK are the reactions catalyzed by glucose-phosphate isomerase, aconitate hydratase, malate synthase, and acetate kinase, respectively (shown in blue). Thickness of the arrow indicates the flux over the given reaction the reference arrow at the bottom right shows a flux of 10 mmol/gDW/h

Adaptive diversification in one LTEE population, matching evoFBA predictions

Two distinct lineages had emerged in one of the LTEE populations, called Ara-2, by 6500 generations, and they have coexisted ever since [26, 50]. The lineages are called S (small) and L (large) after their colony sizes on agar plates. The maintenance of this polymorphism depends on a cross-feeding interaction in which the L type is a better competitor for the exogenously supplied glucose and the S type is better at using one or more secreted byproducts [26], although the precise ecological and metabolic mechanisms are still unknown. Therefore, we used predictions from the evoFBA simulations to generate hypotheses about these mechanisms.

We hypothesized that, first, L specializes on glucose and secretes acetate and, second, S specializes by improved acetate consumption. We tested this hypothesis by analyzing two evolved clones sampled at generation 6500 from the S and L lineages, named 6.5KS1 and 6.5KL4, respectively. HPLC analyses confirmed the presence of acetate in a 24-h supernatant of 6.5KL4 that was grown in the same medium as the LTEE (see Methods). Acetate was not detected after growing 6.5KS1 in that supernatant (Additional file 1: Figure S1). We then measured the acetate and glucose concentrations over time in cultures of the ancestor, 6.5KS1, and 6.5KL4 clones in DM250-glucose medium (Fig. 3b). Both the L and S clones consumed glucose faster than the ancestor, consistent with previous assays [53]. Moreover, in agreement with the evoFBA results, 6.5KL4 secreted acetate, with its concentration remaining high for many hours in the monoculture, and 6.5KS1 drew down its own acetate secretion much faster than both 6.5KL4 and the ancestor. After exhausting the glucose by 6 h, 6.5KS1 showed diauxic growth and consumed acetate until it was depleted after 9 h, whereas 6.5KL4 had barely, if at all, begun to consume acetate at that time even as it had exhausted the glucose by 5 h (Fig. 3b). These results support the hypothesis that the stable coexistence of S and L depends on acetate cross-feeding, with acetate production by both the L and S lineages and more efficient acetate scavenging by the S lineage, which exhibits a faster metabolic switch from glucose to acetate (Additional file 2: Figure S2).

Physiology and fluxes in S and L clones agree qualitatively with evoFBA

The evoFBA simulation reaches an evolutionary equilibrium, whereas the interaction between the S and L lineages remained highly dynamic over thousands of generations [26]. Therefore, we examined the metabolic divergence of the S and L lineages over the course of the LTEE. We first measured the ability of clones from earlier and later generations to grow in minimal media containing glucose or acetate. S clones from later generations typically grew faster and with a shorter lag phase on acetate and more slowly on glucose than S clones from earlier generations, while the opposite trends were observed in the L lineage (Additional file 3: Figure S3) (in line with previous observations [53]). Compared to the ancestor, S clones improved their growth on acetate over evolutionary time, while L clones initially improved somewhat but were variable, with the 50,000-generation L clone showing weak growth similar to the ancestor (Fig. 5). On glucose, the opposite trend was observed with L clones consistently improving compared to the ancestor, while S clones improved initially but declined in later generations (Fig. 5). These patterns of growth relative to the ancestor are consistent with previous assays using the LTEE clones [53, 54]. These evolutionary trajectories of growth on acetate and glucose indicate character displacement and suggest tradeoffs that prevent the simultaneous optimization of growth on both carbon sources. The trajectories are qualitatively consistent with the evoFBA simulations, although the evoFBA predicts complete specialization on glucose without any acetate consumption. This evoFBA prediction represents a potential evolutionarily stable end point, which might eventually occur in the S and L lineages after more generations.

Changes in growth rates of S and L on glucose and acetate over evolutionary time. Growth of S and L clones sampled at multiple generations of the LTEE was followed in DM250-acetate (a) and DM250-glucose (b) media. Clone names are shown above the horizontal red and blue bars, which denote S and L clones, respectively. The ancestor (Anc) and a 2000-generation clone (2 K4) isolated prior to the divergence of the S and L lineages are also included. Growth rates (1/h) are shown according to the colour scale for 1-h sliding windows over 24-h and 7-h periods in the acetate and glucose media, respectively. Empty cells indicate missing values based on filtering negative rates or unreliable values (see Methods)

We then tested the flux patterns predicted by evoFBA (Fig. 4) by measuring, in several LTEE clones, the promoter activities of genes encoding four key metabolic enzymes, using transcriptional fusions with the gfp reporter gene (see Methods). Both S and L clones showed moderately increased promoter activity for pgi relative to the ancestor (Fig. 6). Both S and L clones exhibited larger increases in the promoter activities of acnB and aceB relative to the ancestor, with the S clones showing much greater increases than the L clones, consistent with the possibility of greater flux through the TCA cycle and glyoxylate shunt in the S acetate specialists. There were no obvious changes in the promoter activities of ackA in either the S or L lineages. Of course, there may be discrepancies between promoter activities and actual enzyme activities [55, 56]. Nonetheless, these patterns agree reasonably well with the flux predictions from the evoFBA simulations, especially as they relate to the higher activities in the S lineage of the genes that specifically promote growth on acetate. As noted above, we reiterate that the evoFBA simulations predict an eventual complete loss of the acetate-specific activities in the L lineage, whereas thus far they are merely expressed at a lower level in the L lineage than in the S lineage.

Transcription levels of four genes encoding metabolic enzymes in the ancestor and evolved clones. Promoter activities measured as (dGFP/dt)/OD450nm for genes involved in glucose and acetate metabolism during the first 8 h of growth in DM250-glucose. The clones are, from left to right: 50KS1, 6.5KS1, ancestor (Anc), 6.5KL4 and 50KL1. The genes are, from top to bottom: pgi encoding glucose phosphate isomerase, acnB encoding aconitate hydratase, aceB encoding malate synthase A, and ackA encoding acetate kinase. Activity values are means based on three-fold replication of each assay


Genotype-by-environment interactions due to antibiotic resistance and adaptation in Escherichia coli

Mutations that are beneficial in one environment can have different fitness effects in other environments. In the context of antibiotic resistance, the resulting genotype-by-environment interactions potentially make selection on resistance unpredictable in heterogeneous environments. Furthermore, resistant bacteria frequently fix additional mutations during evolution in the absence of antibiotics. How do these two types of mutations interact to determine the bacterial phenotype across different environments? To address this, I used Escherichia coli as a model system, measuring the effects of nine different rifampicin resistance mutations on bacterial growth in 31 antibiotic-free environments. I did this both before and after approximately 200 generations of experimental evolution in antibiotic-free conditions (LB medium), and did the same for the antibiotic-sensitive wild type after adaptation to the same environment. The following results were observed: (i) bacteria with and without costly resistance mutations adapted to experimental conditions and reached similar levels of competitive fitness (ii) rifampicin resistance mutations and adaptation to LB both indirectly altered growth in other environments and (iii) resistant-evolved genotypes were more phenotypically different from the ancestor and from each other than resistant-nonevolved and sensitive-evolved genotypes. This suggests genotype-by-environment interactions generated by antibiotic resistance mutations, observed previously in short-term experiments, are more pronounced after adaptation to other types of environmental variation, making it difficult to predict long-term selection on resistance mutations from fitness effects in a single environment.

Keywords: Escherichia coli antibiotic resistance experimental evolution pleiotropy.

© 2013 The Author. Journal of Evolutionary Biology © 2013 European Society For Evolutionary Biology.


Lesson Guide

This activity was designed to help students explore the way genetics, chance, selection, and population size lead to changes in an evolving population. As such, it works best if the students are already familiar with each of these topics and their respective terminology. We recommend teaching it after evolution, at the end of a genetics unit. Additionally, we have had success in establishing motivation in students for this activity by first having the class interactively simulate the same process themselves, either in an abbreviated form at the start of the class or altogether in a previous class. We have used Lab 8 in the older 2001 AP Biology Lab Manual, which has students simulate several generations of a population by randomly exchanging cards with alleles on them, and recommend similar activities (Brewer & Gardner, 2013).

Starting with an interactive simulation helps students build an intuition for what their computer will have to do while illustrating the power of using computers to explore complicated dynamics that are not easily captured without lots of data. Alternatively, Investigation 2: Mathematical Modeling: Hardy-Weinberg in the current AP Biology Investigative Labs manual (within Big Idea 1: Evolution) teaches the same content using a pre-made spreadsheet simulation. This more detailed Hardy-Weinberg activity can be used in conjunction with the activity detailed here, either to offer students a chance to engage with the simulation at a more fundamental level, or to expand the activity into a larger unit covered by a traditional AP Biology course.

Logistically, the activity was designed to take place over two 50-minute periods, but it can also be reasonably taught in one longer period with the discussion questions at the end converted to a worksheet to be completed by students at home. Most importantly, students will need a computer with Internet access, but do not need more than one computer per group. The equations provided are specific to Google Sheets and will not all work correctly in other spreadsheet applications, such as Microsoft Excel or Apple Numbers. All Google Docs, including Sheets, are freely accessible online and easily saved and shared, so students can work on any computer and take their simulation with them afterward. We hope this will make the activity more widely usable for teachers and reusable for students.

The following links (shortened from their longer original Google URLs) make a copy of the source spreadsheets, which will stay active and static, so teachers and students can make as many copies of the original as they like:

Day 1 (50 minutes)

The first day should be spent entirely on building the simulation, which is to be done by the students in small groups. The template spreadsheet helps by ensuring all of the functions work correctly to produce the final figure (see Figure 1, section 7), but there is no reason not to adapt the general concept to a different layout. Additionally, it is easier to work with a smaller population so everything can fit on a computer screen simultaneously, but the simulation really should be applied to more realistic population sizes. We have supplied a version with only 10 individuals, so students can see all 7 sections simultaneously and re-run the simulation many times very quickly, as well as a more realistic version with 100 individuals that takes longer to run, during which students can work on discussion questions. Unfortunately, larger population sizes (e.g., N = 1000) take too long to run after each change in the spreadsheet, making them unusable. For teachers more interested in the output of these kinds of simulations than actually building them, we recommend programs such as AlleleA1 (http://faculty.washington.edu/herronjc/SoftwareFolder/AlleleA1.html).

Layout of the simulation as it appears in the accompanying Google Sheets template (N = 10). Sections are labeled in the order they are to be completed by students: (1) starting population (2) (reproductive advantage of A)/(reproductive advantage of a) (3) cumulative probability of reproducing (4) randomly chosen reproducing individuals (5) population (6) allelic proportions (7) graph of allelic proportions. Generations 6 through 96 are not shown to save space. Highlighted cells indicate where students can manipulate the simulation.

Layout of the simulation as it appears in the accompanying Google Sheets template (N = 10). Sections are labeled in the order they are to be completed by students: (1) starting population (2) (reproductive advantage of A)/(reproductive advantage of a) (3) cumulative probability of reproducing (4) randomly chosen reproducing individuals (5) population (6) allelic proportions (7) graph of allelic proportions. Generations 6 through 96 are not shown to save space. Highlighted cells indicate where students can manipulate the simulation.

Each section should be completed in the order it is listed in Figure 1 using the equations in Table 2. Correct completion of the simulation is important so that students can use it to answer questions about inheritance and selection, but no more so than the process of creating it. As such, students should fill in each section only after its literal function and its relation to the overarching goal of simulating an evolving population is discussed with the class. Doing so will help prevent students from getting lost, but also emphasizes the multistep nature of science where hypotheses are tested using complicated protocols involving both specific and holistic challenges.

How can you simulate a new mutation?

Start the population with a single heterozygous individual, and all the rest homozygous.

How can you simulate a dramatic environmental shift for a genetically diverse population?

Start the population with approximately equal proportions of A and a spread across homozygous recessive (aa), heterozygous (Aa), and homozygous dominant (AA) individuals. Then use the next section, which specifies the Reproductive Advantage of A, to create a strong selective pressure.

Do heterozygous Aa individuals have the reproductive advantage of A or a?

They have the reproductive advantage of A, the same as homozygous dominant AA individuals, because selection acts on phenotypes and Aa individuals express the dominant phenotype encoded in their A allele.

What does a reproductive advantage of 1 mean? What about values less than 1? What about 0? Are there limits to what this number can be?

When this number is 1, it means all genotypes are equally likely to reproduce. If it is less than 1, it means individuals with the dominant phenotype are less likely to reproduce. A value of 0 means individuals with at least one A allele are unable to reproduce. The only requirement is that this number is not negative, but it can be zero or any positive number, even very small or large ones.

What affects the reproductive advantage of a mutation?

A mutation can intrinsically affect survivorship, as well as how the individual interacts with its environment.

How would the simulation change if the mutation and its reproductive advantage were recessive?

It would disappear more often, as only homozygous recessive individuals would be affected.

Why do all of the numbers range between 0 and 1?

Why do the numbers increase?

This is a cumulative probability distribution (CDF), so the difference between numbers—the bin width—is the probability of that individual reproducing, not the numbers themselves.

Why use a cumulative probability distribution?

A convenient way to randomly pick individuals is to assign each one a part of the number line between 0 and 1, then randomly generate a number between 0 and 1 and choose whichever individual's segment it falls in.

How do the IDs in this section relate to those in the Offsprings’ Parents section?

Individual 1 in Offsprings’ Parents is the ID listed in column 1 of this section, Individual 2 in Offsprings’ Parents is the ID listed in column 2 of this section, and so on for every column.

Is it possible to have asexual reproduction in the simulation?

Yes. If close-enough random numbers are generated, the same individual will be selected to be the mother and father of the same offspring.

Try tracing an individual's alleles back through the simulation to their parents.

Is an aA individual the same as an Aa individual?

How do aA individuals arise?

It happens when the first parent passes on a recessive allele because the alleles are not sorted to always have heterozygous individuals labeled as “Aa.”

What is the sum of A and a? Will they always sum to this?

They must sum to 1 because there are only two alleles, so the sum of each's proportion in the population must be the entire population.

What would you expect these proportions to be if there were more than two alleles?

Think about blood types as an example. They still must sum to 1, but otherwise they can be anything.

How can you simulate a new mutation?

Start the population with a single heterozygous individual, and all the rest homozygous.

How can you simulate a dramatic environmental shift for a genetically diverse population?

Start the population with approximately equal proportions of A and a spread across homozygous recessive (aa), heterozygous (Aa), and homozygous dominant (AA) individuals. Then use the next section, which specifies the Reproductive Advantage of A, to create a strong selective pressure.

Do heterozygous Aa individuals have the reproductive advantage of A or a?

They have the reproductive advantage of A, the same as homozygous dominant AA individuals, because selection acts on phenotypes and Aa individuals express the dominant phenotype encoded in their A allele.

What does a reproductive advantage of 1 mean? What about values less than 1? What about 0? Are there limits to what this number can be?

When this number is 1, it means all genotypes are equally likely to reproduce. If it is less than 1, it means individuals with the dominant phenotype are less likely to reproduce. A value of 0 means individuals with at least one A allele are unable to reproduce. The only requirement is that this number is not negative, but it can be zero or any positive number, even very small or large ones.

What affects the reproductive advantage of a mutation?

A mutation can intrinsically affect survivorship, as well as how the individual interacts with its environment.

How would the simulation change if the mutation and its reproductive advantage were recessive?

It would disappear more often, as only homozygous recessive individuals would be affected.

Why do all of the numbers range between 0 and 1?

Why do the numbers increase?

This is a cumulative probability distribution (CDF), so the difference between numbers—the bin width—is the probability of that individual reproducing, not the numbers themselves.

Why use a cumulative probability distribution?

A convenient way to randomly pick individuals is to assign each one a part of the number line between 0 and 1, then randomly generate a number between 0 and 1 and choose whichever individual's segment it falls in.

How do the IDs in this section relate to those in the Offsprings’ Parents section?

Individual 1 in Offsprings’ Parents is the ID listed in column 1 of this section, Individual 2 in Offsprings’ Parents is the ID listed in column 2 of this section, and so on for every column.

Is it possible to have asexual reproduction in the simulation?

Yes. If close-enough random numbers are generated, the same individual will be selected to be the mother and father of the same offspring.

Try tracing an individual's alleles back through the simulation to their parents.

Is an aA individual the same as an Aa individual?

How do aA individuals arise?

It happens when the first parent passes on a recessive allele because the alleles are not sorted to always have heterozygous individuals labeled as “Aa.”

What is the sum of A and a? Will they always sum to this?

They must sum to 1 because there are only two alleles, so the sum of each's proportion in the population must be the entire population.

What would you expect these proportions to be if there were more than two alleles?

Think about blood types as an example. They still must sum to 1, but otherwise they can be anything.

The equations provided should be entered in the upper left cell of the corresponding section. This cell can then be dragged to the right and down to fill in the entire section. Alternatively, as dragging cells can be tedious, quick cell filling can also be accomplished by highlighting all of the cells, including the upper left cell with the equation already entered, and then pressing Ctrl + R and then Ctrl + D. The reverse order (Ctrl + D followed by Ctrl + R) also works. Either method will also copy the borders on the upper left cell in each section. These borders have no effect on the contents of the cells and can be manually deleted.

The completed spreadsheet will then resemble Figure 2. Refreshing the spreadsheet (Ctrl + R) will automatically recalculate the entire evolutionary simulation, and redraw the final figure (section 7).

An example of a completed simulation. Generations 8 through 96 are not shown to save space. Highlighted cells indicate where students can manipulate the simulation.

An example of a completed simulation. Generations 8 through 96 are not shown to save space. Highlighted cells indicate where students can manipulate the simulation.

Day 2 (50 minutes)

Once the simulation is built, students can focus on using it to explore the relationships among inheritance, natural selection, and chance. We designed the following short-answer, multipart questions as examples for teachers to make this exploration creative and challenging, but also relevant to the previously stated educational goals. Their purpose is to stimulate discussions, either within small groups or as a class. The goal here is not for students to provide correct answers to every sub-question.

Question 1. Conservation biologists are concerned with preserving and promoting genetic diversity. What is the mean generation time for genetic drift to cause a neutral allele (no reproductive advantage or disadvantage) to become fixed in the population? If you were in charge of making decisions that would impact an endangered species, how helpful would this mean generation time be? What else might you want to know?
Reasonable responses should address the idea that the mean alone is not a good basis for conservation decisions. Some sense of variability would allow for more informed action.
Question 2. Genetic drift is an evolutionary mechanism known to cause populations to change from one generation to the next. How long does it take for genetic drift to cause the population to be significantly different in future generations from when it started, based on the allele frequencies? (Hint: The answer is different every time you run the simulation. How many generations are needed so that 50% of the time the population will be significantly different? 75%? 95%?) What statistical test should be used here? If you had to decide whether to classify a species as endangered, does it make sense to rely on statistical significance?
Reasonable responses should address the difference between biological and statistical significance. The chi-square goodness-of-fit test addresses the latter, but effective conservation requires understanding that this is not always relevant because statistically significant effects can be so small that they are likely meaningless to the actual population. An easy to use chi-square goodness-of-fit test can be found here: http://turner.faculty.swau.edu/mathematics/math241/materials/contablecalc/. Students will have to convert their proportions back into the actual number of A and a alleles by multiplying the proportions by twice the size of the population (every individual has two alleles).
Question 3. How much do the “starting conditions” (i.e., the allele frequencies in the starting generation) matter? How is an endangered species that used to be common different from a species that was never very numerous? Does the historical difference matter if both species are currently endangered?
Reasonable responses should address the importance of the “starting conditions.” Population bottlenecks (like what cheetahs went through) are so dangerous because even if the number of individuals increases, they will still have less genetic variability, just as if there were never many of them. Moreover, there is no difference between losing genetic variability and never having much in the first place. Moving forward, populations in both situations will struggle to adapt to a changing environment and selective pressures.
Question 4. How much of a reproductive advantage does a mutation need to offer for it to become fixed in the population 50 percent of the time? Did you expect this to be larger? Smaller? Why? How might this depend on the size of the population? What about whether the mutation is dominant or recessive? How much of an advantage do most mutations likely offer?
Reasonable responses should include the idea that it takes a very advantageous mutation or quite a bit of luck (or some combination) for a mutation to become fixed in the population. Additionally, students might comment on how this implies a fast rate of mutations, and how challenging it is for scientists to quantify the advantage or disadvantage of a single mutation.
Question 5. This simulation is built entirely on manipulating random numbers. Where do random numbers come from? Are they actually random? Try to come up with a way of creating random numbers on your own.
Reasonable responses should be a bit philosophical, and address what it means for something to be random. Scientists are still unclear if anything in the Universe is truly random (we think very small particles do in fact behave truly randomly, in accordance with the theory of quantum mechanics), but computers are not capable of producing actually random numbers. They use what are called pseudo-random number generators, which appear random but are actually not. Examples include the Linear Congruence Method (Brunner & Uhl, 1999), the Middle-Square Method (Von Neumann, 1951), the Mersenne Twister (Matsumoto & Nishimura, 1998), and Fortuna (Ferguson & Schneier, 2003).
Question 6. Arieh Warshel, who shared the 2013 Nobel Prize in Chemistry for computer simulations of biological functions, said that “when you do something on [a] computer, it's very easy to dismiss it and say you made it up.” (Chang, 2014). Do you agree? Why?
Reasonable responses should address the pros and cons of theoretical studies and experiments. Theory can lead to more precise and justified conclusions, but often at the expense of being realistic. Experiments offer intrinsically realistic insights, but at the expense of being able to say what exactly caused the outcome of the experiment.
Question 7. Each time you run the simulation, the outcome can change, sometimes dramatically, but each simulation is equally likely. What does this say about the natural world?
Reasonable responses should involve a sense that nothing in the natural world is “meant to be,” but rather the result of a balancing act between chance and advantage. Moreover, if the Universe were to start all over again, it may lead to very different outcomes. We live in just one of those outcomes.
Question 1. Conservation biologists are concerned with preserving and promoting genetic diversity. What is the mean generation time for genetic drift to cause a neutral allele (no reproductive advantage or disadvantage) to become fixed in the population? If you were in charge of making decisions that would impact an endangered species, how helpful would this mean generation time be? What else might you want to know?
Reasonable responses should address the idea that the mean alone is not a good basis for conservation decisions. Some sense of variability would allow for more informed action.
Question 2. Genetic drift is an evolutionary mechanism known to cause populations to change from one generation to the next. How long does it take for genetic drift to cause the population to be significantly different in future generations from when it started, based on the allele frequencies? (Hint: The answer is different every time you run the simulation. How many generations are needed so that 50% of the time the population will be significantly different? 75%? 95%?) What statistical test should be used here? If you had to decide whether to classify a species as endangered, does it make sense to rely on statistical significance?
Reasonable responses should address the difference between biological and statistical significance. The chi-square goodness-of-fit test addresses the latter, but effective conservation requires understanding that this is not always relevant because statistically significant effects can be so small that they are likely meaningless to the actual population. An easy to use chi-square goodness-of-fit test can be found here: http://turner.faculty.swau.edu/mathematics/math241/materials/contablecalc/. Students will have to convert their proportions back into the actual number of A and a alleles by multiplying the proportions by twice the size of the population (every individual has two alleles).
Question 3. How much do the “starting conditions” (i.e., the allele frequencies in the starting generation) matter? How is an endangered species that used to be common different from a species that was never very numerous? Does the historical difference matter if both species are currently endangered?
Reasonable responses should address the importance of the “starting conditions.” Population bottlenecks (like what cheetahs went through) are so dangerous because even if the number of individuals increases, they will still have less genetic variability, just as if there were never many of them. Moreover, there is no difference between losing genetic variability and never having much in the first place. Moving forward, populations in both situations will struggle to adapt to a changing environment and selective pressures.
Question 4. How much of a reproductive advantage does a mutation need to offer for it to become fixed in the population 50 percent of the time? Did you expect this to be larger? Smaller? Why? How might this depend on the size of the population? What about whether the mutation is dominant or recessive? How much of an advantage do most mutations likely offer?
Reasonable responses should include the idea that it takes a very advantageous mutation or quite a bit of luck (or some combination) for a mutation to become fixed in the population. Additionally, students might comment on how this implies a fast rate of mutations, and how challenging it is for scientists to quantify the advantage or disadvantage of a single mutation.
Question 5. This simulation is built entirely on manipulating random numbers. Where do random numbers come from? Are they actually random? Try to come up with a way of creating random numbers on your own.
Reasonable responses should be a bit philosophical, and address what it means for something to be random. Scientists are still unclear if anything in the Universe is truly random (we think very small particles do in fact behave truly randomly, in accordance with the theory of quantum mechanics), but computers are not capable of producing actually random numbers. They use what are called pseudo-random number generators, which appear random but are actually not. Examples include the Linear Congruence Method (Brunner & Uhl, 1999), the Middle-Square Method (Von Neumann, 1951), the Mersenne Twister (Matsumoto & Nishimura, 1998), and Fortuna (Ferguson & Schneier, 2003).
Question 6. Arieh Warshel, who shared the 2013 Nobel Prize in Chemistry for computer simulations of biological functions, said that “when you do something on [a] computer, it's very easy to dismiss it and say you made it up.” (Chang, 2014). Do you agree? Why?
Reasonable responses should address the pros and cons of theoretical studies and experiments. Theory can lead to more precise and justified conclusions, but often at the expense of being realistic. Experiments offer intrinsically realistic insights, but at the expense of being able to say what exactly caused the outcome of the experiment.
Question 7. Each time you run the simulation, the outcome can change, sometimes dramatically, but each simulation is equally likely. What does this say about the natural world?
Reasonable responses should involve a sense that nothing in the natural world is “meant to be,” but rather the result of a balancing act between chance and advantage. Moreover, if the Universe were to start all over again, it may lead to very different outcomes. We live in just one of those outcomes.

Dobzhansky wrote that “seen in the light of evolution, biology is, perhaps, intellectually the most satisfying and inspiring science. Without that light it becomes a pile of sundry facts—some of them interesting or curious but making no meaningful picture as a whole.” (Dobzhansky, 1973). Ensuring that students leave high school with Dobzhansky's light is as important a task as any for a high school biology teacher, and one that requires providing students with activities that require them to think like scientists. This lesson will help with this challenging yet essential aspect of biology education.

The authors would like to thank D. S. Goldberg and D. F. Doak for their help in designing this lesson and making it more usable in a classroom setting. Paul Strode's 2015/16 IB/AP Biology students and teachers in an AP Biology Summer Institute field-tested an earlier version of the spreadsheet activity. A. P. Martin provided valuable feedback on the first draft of the manuscript, and comments from two anonymous reviewers greatly improved the clarity of the paper. Graduate funding for Ryan Langendorf was provided by National Science Foundation grants GK-12 0841423 and DGE-1144083.


Watch the video: Genetics incomplete Dominance in Flowers (November 2022).