Milking the Data: How genomic selection herded in a breeding boom
Sometimes, great advances in science come from combining the old with the new. Genomic selection is one such case; in 2001, Meuwissen, Hayes, and Goddard surveyed the changing landscape of genetics, had the foresight to work on a then-theoretical problem, and laid the foundation for a boom in biotechnology-assisted breeding that continues to this day. In the May issue of GENETICS, Associate Editor and G3 Deputy Editor Dirk-Jan de Koning introduces Meuwissen et al. as one of the journal’s 100th anniversary Classics.
Humans have been shaping plants and animals through selective breeding for thousands of years. In the 18th century, breeders started keeping detailed records, and a more systematic approach to selective breeding began after World War II. In this approach, the “breeding value” of a given animal was determined through detailed pedigree information, phenotyping of important traits, and test breeding. During the 20th century, the field of genetics grew by leaps and bounds, allowing breeders to add molecular genetic tools to their toolboxes and improving dairy cattle, sheep, and maize breeding – just to name a few.
To combine genetics and selective breeding, scientists had to identify genetic markers connected to traits of interest. When a marker is genetically linked to a gene or a quantitative trait locus (QTL) influencing the trait of interest, plant and animal scientists can indirectly test for the presence of the favorable alleles affecting the trait by genotyping for the marker. Many scientists thought this technique – called marked assisted selection (MAS) – would change the game for selective breeding, allowing faster progress in crops and livestock than phenotyping alone could achieve. But MAS relied on marker maps that were relatively sparse and QTL that had to be carefully – and labor-intensively – validated. It wasn’t quite the leap forward many had hoped it would be.
In 2001, on the cusp of the completion of the Human Genome Project, it seemed likely that marker maps with much higher density would soon be available for crops and livestock; maybe these dense marker maps would be the real game-changer. There was a problem, though: mapping QTL using the methods designed for sparse marker maps wasn’t greatly improved by a higher density of marker information. Without a better way to use such data, it wouldn’t markedly improve assessment of genetic value compared to traditional phenotyping and MAS.
In a move that showed great foresight, Meuwissen et al. took on this challenge, devising a way to accurately predict genetic value from dense marker maps.
Instead of finding the most significant markers or QTL, they focused on a way to estimate the effects of all markers together without testing for significance. Using a simulated data set of 1010 genetic markers and 1000 QTL, Meuwissen et al. tested four different modeling methods – least-squares linear regression, best linear unbiased prediction (BLUP), and two Bayesian analyses – for accuracy in predicting the breeding value of an individual genotyped for many alleles. According to their data, BLUP and the Bayesian methods were able to predict breeding values with accuracies upwards of 0.73.
This was a significant breakthrough, and it was met with enthusiasm by agricultural science communities, but at the time of publication, this work was still theoretical. The dense marker maps for crops and livestock necessary to build the marker haplotypes needed for this method – termed genomic selection – didn’t yet exist.
After spending five years in the realm of “potentially field-changing,” technology eventually caught up with theory for genomic selection: medium-density SNP arrays became available for many agricultural species, and the real-world applications took off almost immediately.
Dairy cattle breeders were the first to benefit. Traditional phenotyping of sires had been painfully slow and expensive – for example, testing important milk-related traits involved waiting years for the female offspring of a bull to reach maturity. An extensive data recording structure was already in place to facilitate progeny testing of candidate bulls, which made it easy to implement genomic selection without major changes in data recording. Genomic selection is now routine in the dairy cattle industry. It is also becoming an increasingly important tool for many other livestock and crop species, less than a decade since new genotyping technologies made genomic selection a reality. Fittingly, Meuwissen and Goddard were recognized this year by the National Academy of Sciences; they received the John J. Carty Award for contributions to the agricultural sciences.
Since Meuwissen et al. published their landmark work in GENETICS, the GSA journals have continued to serve as a crucial forum for advances and debate in genomic selection. GENETICS and G3 host an ongoing genomic selection series, which has >50 papers. This collection includes reviews, methods, tools, and applications, plus access to valuable data sets for comparing and benchmarking new approaches.
Meuwissen, T.H.E., Hayes, B.J., Goddard, M.E. 2001. Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps. GENETICS, 157(4): 1819-1829. http://www.genetics.org/content/157/4/1819