Fecal alchemy: Turning poop into genomics gold
When it comes to genotyping technology, poop genetics is stuck in the 1990s. While most geneticists are now awash in genome-scale data from thousands of individuals, those who depend on fecal and other non-invasively collected samples still rely on old-school, boutique panels of a dozen or so genetic markers.
But feces — along with fur, feathers, and urine — is critically important stuff for understanding the population genetics, ecology, evolution, behavior, and conservation of wild animals. Many are too elusive or endangered to allow collection of blood samples, and even for common species it is a logistical nightmare to immobilize and draw blood from large numbers of animals in the field. In the latest issue of GENETICS, Snyder-Mackler et al. describe tools that promise to advance studies of such samples into the genomic era.
Noninvasively collected samples have the obvious advantage of easy access. “We have freezers and freezers full of baboon poop,” says study co-leader Jenny Tung (Duke University). Tung’s group works on behavior and genetics in a wild baboon population in Kenya. But though abundant, poop also presents serious challenges for standard genetic analysis. The DNA present in noninvasive samples is typically a fragmented mixture of host and contaminant sequence. For example, only around 1% of the DNA in a fecal sample comes from the animal that produced the poop. Most of the rest is microbial.
These limitations were first overcome in the 1980s and 1990s, and the ability to analyze DNA from noninvasive samples revolutionized the field. Using such samples not only allowed geneticists to understand the genetic diversity and viability of endangered animals, it allowed them to empirically test important theories about animal behavior and evolution.
“There are many examples. Noninvasive sampling of chimps, baboons, rhesus macaques and other primates revealed that animals really do bias their behavior towards relatives, even paternal relatives that are likely more difficult for an individual to identify as kin,” says Tung. “And in baboons, it also showed that males provide some paternal care to their offspring, which wasn’t expected for a polygamous primate.”
But the genotyping methods used in such studies have changed surprisingly little over the last twenty years. For the most part, researchers still use small groups of carefully validated markers, usually based on stretches of short tandem repeat sequences (microsatellites). This means the field has mostly missed out on the benefits of genomics that have become routine for medical researchers and those who work with laboratory organisms.
“Microsatellite approaches still work. But over the last 5 or 10 years it has become impossible to ignore the way genome-scale datasets allow you to answer entirely different questions,” says Tung.
For example, data on how a genome varies across a population can provide crucial evidence of the evolutionary and demographic forces that have shaped it. Genomic data can also trace in detail the mergers and separations of mixing populations.
The good news for poop genomics is that short-read next-generation sequencing methods are well suited to the fragmented DNA found in noninvasive samples. These methods have been famously adapted for analyzing a sample type that also suffers from vanishingly small amounts of target sequence: ancient DNA. The bad news is that the expensive, intensive approaches that work well for a precious sample of Neanderthal bone are not practical for a geneticist facing a freezer full of poop.
About six years ago, Tung’s friend and colleague George (PJ) Perry published a major advance that allowed large-scale resequencing from noninvasive samples. It was based on a method known as sequence capture, which enriches for host sequence using synthetic RNA “baits” to capture the target DNA. Tung was excited by the possibilities of the methods, but realized it was still too expensive for most applications. This was partly because the baits had to be custom-designed and synthesized for the species of interest. The method also had the drawback of only capturing a tiny fraction of the genome, while consuming large amounts of sample.
“Even fecal samples are exhaustible,” says Tung. “We have a lot of irreplaceable samples from dead animals, for instance. If we’re going to use them up, we want to cover all our bases and gather data on a truly genome-wide scale.”
So Tung’s group and their collaborators worked to modify and scale up Perry’s protocol. They also constructed the baits in a considerably cheaper way, using in vitro transcription of RNA from baboon DNA templates, sidestepping the need for custom synthesis. The new protocol had more modest input DNA requirements and could enrich the target DNA by 40-fold.
But getting enough sequence per sample was just the beginning. Xiang Zhou (University of Michigan) led the group’s efforts to develop tools to analyze data from the new method. Zhou says one of the reasons microsatellites became so popular was the availability of standard and easy-to-use software for assigning paternity from the data. “If people are going to transition to a new method, we thought it would be incredibly important that we package our models into software that will make it as easy as possible,” says Zhou.
But to develop something comparable for low-coverage sequence, the team faced two major challenges: the data is simultaneously much richer (more sequence) and much lower quality (more uncertainty). To deal with the large quantity of data they needed much more computationally efficient algorithms. They also had to factor in the lower data quality, which makes it impossible to use the simpler approaches that work when the genotype at each site is known with certainty. Instead, they incorporated the error rate across all the sites in the genome, generating a sophisticated statistical model.
Using the new capture method and the paternity assignment software (called WHODAD), the team were able to construct pedigrees from baboon fecal samples that almost perfectly matched those created using traditional analysis of high-quality DNA from blood. In short, despite the low coverage of the genome (typically less than 1x), and the resulting very high uncertainty of the genotype at any one site, the trends in the data were more than enough to reconstruct family relationships.
But what about cost? Lead author Noah Snyder-Mackler gave the project the pet name “fecal alchemy” because it aims to transform poop into a data goldmine. But not every researcher can afford gold — most labs must use the cheapest tool that will get the job done. Tung says they included a cost analysis in the paper because they are regularly asked about the price of making the switch.
“Right now it costs about twice as much to produce 1x coverage of the entire baboon genome as it does to type 14 microsatellites. But the amount of information you get is much greater! So if you’re thinking in terms of cost per genotype, our method is way more cost effective. But in terms of absolute amounts it’s more expensive. In the end the cost-benefit decision depends on what questions you’re trying to answer,” says Tung. “Of course we’d like to get it even cheaper and more efficient and more robust. We’re working on it!”
This work was partly funded by the National Science Foundation DEB through an EAGER grant, with co-funding from NSF Biological Anthropology.
Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples. , , , , , , , , , , , (2016). Genetics, 203(2), 699-714.