Well over 15,000 years ago, a man and a bear died in a cave in the Jura Mountains in modern-day Switzerland. That was the end of the story for millennia—until their remains were discovered in 1954 by researchers investigating the cave. Further work in the 1990s uncovered the fact that the man had, in fact, shot the bear with an arrow. This established their bond beyond a coincidentally shared grave, identifying the man as a hunter-gatherer. Now, thousands upon thousands of years after he lived, geneticists are developing new methods to analyze this hunter-gatherer’s DNA in an effort to better understand genetic diversity in ancient humans—and how that compares to our diversity today.
In a report published this month in GENETICS, Kousathanas et al. describe a method to infer the level of genetic diversity from sequence data that doesn’t fit the bill for more common methods. This method improves our ability to analyze genomic sequences from ancient DNA samples, as well as other datasets with less than ideal sequence data.
Heterozygosity—that is, presence of two different bases or alleles at a single site—is a marker of genetic diversity, and the degree of heterozygosity present throughout a particular genome segment gives insight into that region’s evolutionary history. Identifying—or “calling”—heterozygosity is fairly straightforward with high-quality DNA and current sequencing technologies, which produce very high depths of coverage. Depth of coverage refers to the number of times a specific base in the genome is represented in the sequencing data; current next-generation sequencing methods can produce 30-40X coverage, sometimes higher. Existing analysis pipelines can easily call heterozygosity from high coverage data.
Calling heterozygosity from low coverage data is much harder because sequencing machine errors can be mistaken for true genetic variation. Low coverage sequence data is often all that can be derived from the tiny amounts of degraded DNA typically recovered from ancient tissue samples. To make things worse, ancient samples are also affected by postmortem DNA damage, which can dramatically increase the number of sequencing errors in the data. Determining levels of genetic diversity from prehistoric human DNA is a challenge, and it’s one that Kousathanas et al. attempt to solve by creating a method that uses a probabilistic framework to infer heterozygosity.
In broad terms, the method involves three steps: 1) estimate parameters of models that describe post-mortem DNA damage since that damage causes signature base substitutions, particularly at the beginning of sequencing reads, 2) recalibrate the quality scores given by sequencing machines by assuming a section of the sequence is monomorphic (e.g. the X chromosome in human males), which allows for better determination of the base-specific error rates present in the data 3) infer heterozygosity while accounting for the inferred DNA damage profiles and the recalibrated quality scores. Their method allows them to produce very accurate estimates of heterozygosity for regions about one megabase in size, which the scientists demonstrate first by analyzing a simulated chromosome.
To compare genetic diversity in ancient and modern humans, Kousathanas et al. analyzed genomic data from four prehistoric individuals from Europe and Africa—including the hunter-gatherer found in Switzerland—and several male individuals from the 1000 Genomes Project.
The researchers found that both ancient and modern African samples show much greater genetic diversity than European individuals, and diversity inferred from the ancient European samples was lower than that found in modern samples, which they think is due to smaller paleolithic population sizes.
The ability to more accurately analyze genomic diversity in “difficult” DNA samples may provide a more detailed look into the past, allowing a better understanding of the evolutionary processes that have shaped genetic variation.
Kousathanas, A., Leuenberger, C., Link, V., Sell, C., Burger, J., Wegmann, D. 2017. Inferring Heterozygosity from Ancient and Low Coverage Genomes. GENETICS 205(1): 317-322. doi: 10.1534/genetics.116.189985