diploS/HIC uses machine learning to identify selective sweeps in unphased data.


A set of footprints can tell us a lot about the creature that left them—without requiring us to see the creature itself. Footprints can suggest the animal’s size, weight, and stride, and from there, we can extrapolate even more information. Much like soft sand holds evidence of past visitors to a beach, genomes retain the marks of selection long after the evolutionary pressures that caused them are gone. In this month’s issue of G3: Genes|Genomes|Genetics, a featured paper by Kern and Schrider provides a valuable tool for finding and classifying such genomic traces of past selection.

Putting together a complete picture of genomic loci that have been targeted by selection requires both the accurate identification and classification of the types of selective sweeps. Hard sweeps occur when a rare but beneficial mutation arises de novo, confers an adaptive advantage, and quickly reaches fixation in a population. Soft sweeps, on the other hand, are derived from standing variation; when a previously neutral variant suddenly confers an advantage due to a change in environment, it can rise to fixation.

However, distinguishing the genomic effects of these two processes is challenging. Hard sweeps can cause spurious signatures of soft sweeps in genomic regions that lie in a sweet spot in distance from the target of selection: not too close and not too far. Any tool that tries to classify selective sweeps has to take into account distance from the true target of selection.

In 2016, Kern and Schrider developed a technique called S/HIC that uses machine learning to identify selective sweeps and is aware of the genomic spatial context. Their newly reported method, diploS/HIC (pronounced “deep-lo-shick”) improves on the original by allowing for the use of unphased data—that is, genotypic data that hasn’t been assigned to a haplotype (“phased”). The new method also incorporates additional spatial information through a machine learning approach commonly used in image recognition. By representing population genomic data as images, they were able to train a deep convolutional neural network to identify genomic windows that bear the signature of hard and soft selective sweeps. They demonstrate diploS/HIC by simulating different selective scenarios based on data from an Anopheles gambiae mosquito population.

Cutting edge population genetics methodologies are constantly in development, and updates like diploS/HIC that broaden researchers’ ability to use datasets with different characteristics will continue to advance the field. Our ability to track the genomic footprints of selection is getting better all the time.

CITATION

diploS/HIC: An Updated Approach to Classifying Selective Sweeps
Andrew D. Kern and Daniel R. Schrider
G3: GENES, GENOMES, GENETICS June 1, 2018 vol. 8 no. 6 1959-1970;
https://doi.org/10.1534/g3.118.200262

Scientific Editor and Programs Manager. Genetics and Molecular Biology PhD. Find me on Twitter: @_sbay

View all posts by Sarah Bay »