Where do humans come from? Who are our close ancestors? How many of us existed before we split from the common ancestors? The answers to these questions lie in the signatures of past evolutionary events hidden in the genome, and finding them from vast genomic sequences requires sophisticated statistical algorithms. In a new G3: Genes|Genomes|Genetics study, Soni and Jensen present a modified two-step approach to model and characterize the evolutionary parameters governing human history.
Genetic variation is the raw material of evolution. Thus, a central concern of population genetics lies in quantifying the relative contribution of neutral and selective processes in shaping observed levels of variation. Using high-quality human genome sequences from thousands of individuals, population geneticists can attempt to model population history. These models are often complex and require the inference of numerous parameters, such as the effective population size, timing of population splits, rates of population size change, and gene flow between populations. A well-fitting model describes how the interplay of evolutionary processes has generated observed patterns of genomic diversity. Any model—however complex or well-fitting—always simplifies the actual population history. Therefore, researchers continue to employ new statistical approaches that can correctly predict observed genomic patterns by incorporating multiple parameters, while also accurately determining how the interaction of evolutionary processes leads to the observed levels of variation.
In the April issue of G3, Soni and Jensen propose a modified two-step approach to infer demographic parameters and the distribution of fitness effects of new mutations to understand how natural selection shapes genetic variation. The researchers used data from the 1000 Genomes Project, consisting of African, European, South Asian, and East Asian populations, to build a model that accurately describes the roles of selective and neutral evolutionary processes, accounting for the biasing effects of selection on the population history inference, and of demography on the inference of selection.
In the first step, they performed demographic inference on non-functional genomic regions distant enough from coding regions to avoid the confounding effects of purifying and background selection. In the second step, the researchers performed inference of the distribution of fitness effects on new mutations using functional regions, conditional on the demographic model inferred in step 1.
Together, the two-step approach achieves a model of evolution that incorporates 25 demographic parameters capturing sizes, bottleneck severities, growth rates, timings of each event, and migration rates between five human populations. Further, this well-fitting baseline model is a suitable null when performing inference of more episodic processes such as positive and balancing selection.
References
-
Vivak Soni, Jeffrey D. Jensen
G3: Genes|Genomes|Genetics. April 2025. 15(4).
DOI: 10.1093/g3journal/jkaf019