50 years of molecular evolution in Drosophila
In the genomic era, population geneticists are flooded with molecular data on the evolution of natural populations. This deluge started in 1966 as a trickle of data from protein electrophoresis studies, including the landmark GENETICS papers published by Richard Lewontin and John Hubby. As Lewontin is honored this week at the Annual Drosophila Research Conference with the Thomas Hunt Morgan Medal, the latest chapter of FlyBook looks back over the fifty years of insights into molecular evolution since fruit flies first burst open the field. Casillas and Barbadilla provide a stunningly thorough journey through the history of molecular populations genetics with an emphasis on the contributions of Drosophila.
Beginning in the late 1910s, population genetics unified the work of Darwin and Mendel, leading to a rich theoretical body of work by the 1960s. Population geneticists used mathematical modeling to describe how the forces of natural selection, migration, mutation, and genetic drift work together to shape patterns of genetic variation in nature. However, this theoretical framework remained untested until the arrival of allozymes—protein variants encoded by different alleles of the same gene. Due to differences in their amino acid content, allozymes travel through a gel at different rates when a current is applied, allowing them to be differentiated from one another. Lewontin and Hubby’s 1966 papers surveyed allozyme variation at several different loci in natural populations of D. pseudoobscura, a wild North American fruit fly. It quickly became common practice to survey natural populations for allozyme variation in known proteins, allowing scientists to quantify genetic variation was actually present.
These studies revealed that genetic variation was plentiful in the wild—much more plentiful than had been predicted. This discovery created a conundrum that was eventually solved by Motoo Kimura’s Neutral Theory, which stated that the majority of variation present in a population has no effect on organismal fitness and that allele frequencies are shaped by the random effects of genetic drift. Following soon after, Tomoko Ohta’s Nearly Neutral Theory incorporated effective population size into the model to predict that in small populations, mildly deleterious mutations would behave as if they were neutral. Though debate continues today about the relative importance of neutral genetic drift vs natural selection, the stage was set by these first studies with allozymes.
The limitations of allozyme studies were well known, however; only amino acid differences that caused a change in the overall charge of the protein could be surveyed. The hunt was on for a more comprehensive way to measure genetic variation. The first population genetic survey of nucleotide variation was carried out in 1983 by Marty Kreitman at the Adh locus in 11 D. melanogaster individuals. This locus had two well-known allozyme alleles, and just one of the 43 SNPs he identified caused the electrophoretic difference between them.
Directly sequencing nucleotides opened up entirely new areas of study, particularly in examining different functional sites like introns, UTRs, and coding sequences. Advances in sequencing technology placed more data within reach of more scientists, leading to many breakthroughs in our understanding of the dynamics of genetic variation in populations. Nucleotide substitutions that changed a protein’s amino acid sequence were found to be mostly deleterious and generally rare, whereas synonymous variation appeared mostly neutral but more common. The critical role of recombination in maintaining genetic variation was discovered, and methods for detecting the signatures of natural selection were developed.
Though nucleotide sequencing offered scientists a glimpse of the fundamental structure of genetic variation, focusing on a small handful of genes was still a limited and biased way of sampling polymorphism. After all, a single gene is typically kilobases long, while an entire genome can have billions of bases. After whole genome sequencing came to prominence around the new millennium, population genomics began to grow. In 2000, D. melanogaster was the third eukaryote to have its entire genome sequence published, and in 2007 the first true population genomics study was carried out in the closely related D. simulans by David Begun and colleagues. They used whole genome shotgun sequencing to examine unbiased variation in seven different lines of D. simulans, showing how polymorphism varied according to chromosomal location and functional region.
As next generation sequencing technologies emerged, studies on whole genome variation became more common and brought a flood of new data. In 2012, Trudy Mackay and colleagues presented the population genomics community with a crucial new resource: The Drosophila Genome Reference Panel, a set of fully sequenced genomes of over 200 D. melanogaster individuals from a single population in Raleigh, NC. This remains the most comprehensive population genetic study of any single species to date. Population genomic studies have shed light on many important evolutionary questions, including the role of recombination, mutation, and natural selection in maintaining and generating polymorphism across genomes. Drosophila species, with their typically enormous effective population sizes, have proven to be an excellent model for detecting genomic signatures of selection.
Just 50 years after the first allozyme studies were performed, population genomics has produced an astounding bounty of data on natural genetic polymorphism. Yet as usual, more data brings more questions, and many important avenues of discovery remain. One outstanding challenge is connecting the layers between phenotype and genotype. Natural selection acts on the outward traits of an organism, and these changes are assumed to be reflected in the genotype—how does selection affect the variation in transient structures like the transcriptome and methylome, and how are those changes translated into the genome? We look forward to the many presentations at the 58th Annual Drosophila Research Conference that are seeking answers to these questions and more.
Casillas, S., & Barbadilla, A. (2017). Molecular Population Genetics. GENETICS, 205(3), 1003-1035. DOI: https://doi.org/10.1534/genetics.116.196493