Short-read sequencing has fueled the acceleration of genetic research But though these next-generation methods are fast and efficient, they can’t do everything well. One important area in which short-reads fall short is detecting structural variants (SV), where chunks of the genome are deleted, inserted, repeated, inverted, or in some other way shuffled around compared to the reference sequence. Such variants play a significant role in natural genetic variation, and cause many genetic diseases. But the global significance of structural variants is not well understood because they are harder to systematically detect than single nucleotide polymorphisms (SNPs), the type of variation targeted by short-read sequencing.
In the January issue of GENETICS, Mak et al. show that mapping long DNA fragments on nanochannel arrays can be used to survey structural variation across individual genomes. Applying the method to examine a human “trio” (mother, father, and child) from the 1000 Genomes Project, they identified seven times as many large insertions and deletions than previously found by sequencing approaches.
The method uses the commercial Irys technology to analyze large DNA fragments (greater than 150 kilobases) by first fluorescently labeling the DNA using an enzyme that nicks one of the strands every time it encounters a specific sequence motif. The labeled DNA fragments are then imaged in a nanochannel array. These arrays stretch out each DNA fragment uniformly within a tiny silicon groove, allowing the distance between each site-specific label to be measured.
The labeling patterns of the DNA molecules allow the overall structure of the fragment to be mapped. So, for example, deletion of a chunk of sequence might appear as loss of a signature pattern. In essence, the high-tech method is reminiscent of the trusty restriction map of the molecular genetics era—in which sequence-specific restriction enzymes cut DNA into defined fragments that could be used to infer the large-scale arrangement of sequence.
One major advantage of the nanochannel method is speed, says senior author Pui-Yan Kwok (University of California, San Francisco). “Mapping usually involves cloning, so you’re making a library, picking colonies, doing digests, putting the pieces back together; it can take forever. With our method, pretty much everything is finished in days,” says Kwok.
Kwok’s group developed the original nick-labeling technique and published proof-of-principle experiments in 2012. The new paper demonstrates how the method can be implemented for mapping structural variation across a genome. The authors generated genome maps of cell lines from three individuals: the 1000 genomes project CEU trio, which is a family whose genomes have been extensively characterized by sequencing methods. Using genomes from parents and a child allows researchers to cross-check that variants show Mendelian inheritance, and allows them to identify haplotypes (the set of variants that lie together on one chromosome of a pair). The genome maps revealed more than 1500 insertions and deletions greater than 5 kilobases in size, compared to the 215 previously found in the well-studied trio. Among the new variants, the team identified five deletions that may influence disease susceptibility. These deletions are homozygous (present in both copies of the genome) and remove all or substantial parts of genes associated with susceptibility to cancers, psoriasis, certain bacterial infections, and resistance to malaria.
While the results are impressive, the method has its limitations, reminds Kwok. It provides only a birds-eye view of the genome, not the base pair resolution delivered by sequencing. That means other methods will still be needed to hone in on important details. Secondly, the method can’t analyze DNA samples isolated by standard techniques because long enough DNA fragments can only be prepared by starting with intact cells. The method also can’t detect all kinds of structural variants. For instance, it can’t identify those smaller than about five kilobases. “There are also some places in the genome that are just too complex even for this method,” says Kwok.
Nonetheless, there are many potential applications. “Many of the plant and animal genome researchers are interested in this approach, to look for big structural variants without needing a lot of other experiments,” says Kwok. His group has been developing clinical applications for analyzing microdeletion syndromes like DiGeorge syndrome, which are diseases caused by relatively small structural variants that are not easy to dissect with conventional approaches.
But the overall goal, says Kwok, is to combine the long-range information provided by nanochannel-based genome mapping with the high resolution of short-read sequencing, while bridging the gap with intermediate methods (e.g. approaches that yield longer sequence reads or that can resolve whole-genome haplotypes). By combining short, medium, and long range approaches, Kwok’s group aims to efficiently generate high quality genome assemblies with single base pair resolution.
“I think we need a more complete picture of the genome,” says Kwok. “People often go for the fastest and cheapest technology rather than the technology that will give them the best information. It’s important to step back a bit and ask: ‘What am I trying to learn from this genome?’”
Mak, A. C., Lai, Y. Y., Lam, E. T., Kwok, T. P., Leung, A. K., Poon, A., … & Andrews, W. (2016). Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays. Genetics, 202(1), 351-362. doi: 10.1534/genetics.115.183483