Mixed up: Insights into artificial sequencing chimeras
Sequencing a genome is not as simple as reading a book. All those neatly lined up letters are the final product of a complex process made up of many intricate steps that can—and do—go wrong. In a report published in G3: Genes|Genomes|Genetics, Peccoud et al. put their painful sequencing experiences to good use providing new insights into a common sequencing problem: artificial chimeras.
Sequencing typically requires cutting up genetic material into fragments. These fragments are then amplified by PCR, and these amplified fragments are then sequenced. The end result is millions of short sequences, called reads. These reads can then be aligned to a reference sequence to identify changes like recombination and mutations.
The authors of the G3 study originally set out to identify recombination events between dengue virus and its host mosquito. They sequenced RNA from virus-infected mosquito cells, and they added pillbug RNA to a separate batch to serve as a control. Unexpectedly, the authors found virus-mosquito and virus-pillbug recombinant reads at similar frequencies. Since the virus RNA had never been in contact with the pillbug RNA before the sequencing procedure, they concluded that most, if not all, of these recombination events must have happened during the amplification or sequencing steps.
False-positives are always disappointing, but instead of giving up, the authors used their data and data from previous studies to better understand how the artificial reads occurred, as well as to learn how to better filter them.
This investigation revealed certain characteristics that are shared by both real and fake recombinant reads, including microhomology around the recombination junction. Crucially, they found that biologically-generated recombination almost always joins sequences in the same orientation, whereas artificial recombinant reads are often joined in opposite directions. The authors explain that this is likely due to template switching during the PCR step of sequencing.
Knowing the traits of false-positive reads may allow researchers to more carefully filter their data in future studies, ensuring they get the most accurate information possible—and knowing that what appears to be a dead end can still yield useful insights may help graduate students sleep better at night.