Assembling a Colossus

The loblolly pine genome is big. Bloated with retrotransposons and other repetitive sequences, it is seven times larger than the human genome and easily big enough to overwhelm standard genome assembly methods.

This forced the loblolly pine genome sequencing team, led by David Neale at the University of California, Davis, to look for ways to reduce the enormous complexity of their task.
The draft genome sequence, described in the latest issue of GENETICS and the journal Genome Biology, was pieced together from over 16 billion sequence reads. Spanning around 23 billion base pairs, it only just beats out the Norway spruce as the largest genome ever sequenced, but it is substantially more complete. For example, the N50 scaffold size of the current loblolly assembly is 66.9 Kbp, compared to 0.72 Kbp in the Norway spruce.

So how did they do it?

One strategy was to generate most of the sequence from part of a single pine nut. This tiny source material was the megagametophyte, which is the haploid tissue that provides nutrients to the developing diploid embryo. Despite the limited amount of DNA that can be extracted from this source, the reduced complexity of a haploid genome makes it easier to assemble. To link up all the sequence fragments from the haploid genome, the team also created DNA libraries from diploid needles of the parent genotype.

But this still left the assembly team, led by Steven Salzberg at Johns Hopkins University and James Yorke at the University of Maryland, with more data than their computational methods could handle.

The solution was a method of pre-processing the data into “super reads”, or larger chunks of contiguous haploid sequence that condensed many individual reads. In essence, they were dealing with the unambiguous parts of the problem first, and getting rid a huge amount of overlapping and redundant data in the process.

The result was a 100-fold reduction in the amount of megagametophyte sequence that needed to be held in the memory of the assembly computer. That kind of reduction is not just handy for giant genomes; Salzberg says it also speeds up projects of more modest scale.

Luckily, says Salzberg, the loblolly genome project wasn’t held back by the masses of repeats that are typical of conifers. Even though around 82% of the loblolly pine genome is repetitive, it turns out that most of the repeats are evolutionarily ancient. That means they have diverged enough to no longer be a big stumbling block for assembly.

All this is good news for sequencing other conifer species, especially since the team is already tackling an even larger behemoth: the 35 gigabase genome of the sugar pine.

Check out the loblolly genome articles and other highlights of this month’s GENETICS.

Zimin A., Stevens K.A., Crepeau M.W., Holtz-Morris A., Koriabine M., Marcais G., Puiu D., Roberts M., Wegrzyn J.L. & de Jong P.J. & (2014). Sequencing and Assembly of the 22-Gb Loblolly Pine Genome, Genetics, 196 (3) 875-890. DOI: 10.1534/genetics.113.159715

Wegrzyn J.L., Liechty J.D., Stevens K.A., Wu L.S., Loopstra C.A., Vasquez-Gross H.A., Dougherty W.M., Lin B.Y., Zieve J.J. & Martinez-Garcia P.J. & (2014). Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation, Genetics, 196 (3) 891-909. DOI: 10.1534/genetics.113.159996

Neale D.B., Wegrzyn J.L., Stevens K.A., Zimin A.V., Puiu D., Crepeau M.W., Cardeno C., Koriabine M., Holtz-Morris A.E. & Liechty J.D. & (2014). Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies, Genome Biology, 15 (3) R59. DOI: 10.1186/gb-2014-15-3-r59

Conservation Biology, Forestry, Genomics, Sequencing, Transposable Elements

Cristy Gelling is a science writer, lapsed yeast geneticist, and former Communications Director at the GSA.

View all posts by Cristy Gelling »

Hongyu Zhao joins GENETICS as new Senior Editor

A new senior editor is joining GENETICS in the Statistical Genetics and Genomics section. We’re excited to welcome Hongyu Zhao to the editorial team. Hongyu ZhaoSenior Editor Hongyu Zhao is the Ira V. Hiscock Professor of Biostatistics, Professor of Genetics, and Professor of Statistics and Data Science at Yale University. He received his BS in…
GSA Member Julio Molina Pineda Receives DeLill Nasser Award, Shines at TAGC 2024

“At any career stage, the GSA membership is an amazing investment for any genetics professional!” Julio Molina Pineda is a PhD Candidate in Cell and Molecular Biology and a Research Assistant at the University of Arkansas, and a Doctoral Academy Fellow at the Lewis Lab. In 2023, Julio was awarded the DeLill Nasser Award for…
In Memoriam: Ellsworth Herman Grell (1932–2023), a pioneer of Drosophila genome engineering and annotation

Ellsworth (Ed) Grell blessed the Drosophila community through three enduring legacies: as a pioneer of chromosome mechanics, as a primary organizer and synthesizer of genetic knowledge in Drosophila, and as a graceful mentor to those fortunate to have known him personally. Ed grew up in rural Nebraska, completed his undergraduate studies at Iowa State, and…
Congratulations to the #Fungal24 Poster Award winners!

We are pleased to announce the recipients of the GSA Poster Awards for posters presented at the 32nd Fungal Genetics Conference! Undergraduate and graduate student members of GSA were eligible for the awards, and a hard-working team of judges made the determinations. Congratulations to all! Felicia Ebot Ojong, The University of Georgia My research is focused…
Poster presentation tips for TAGC 2024

You’ve been selected to present a poster at The Allied Genetics Conference 2024 in March—you’ve celebrated, made plans to attend, now what? This is an exciting opportunity to showcase your research and engage with fellow members of the genetics community, so you want to make sure you’re prepared. We wanted to offer you some tips…
Maximize your TAGC 2024 experience

A guide to all that National Harbor & DC have to offer Are you joining us for The Allied Genetics Conference 2024 in March? Make the most of your #TAGC24 experience in National Harbor! We know the science will keep you busy, but you deserve to unwind and have some fun, so we’ve curated a…
Early Career Leadership Spotlight: Sarah Petrosky

We’re taking time to get to know the members of the GSA’s Early Career Scientist Committees. Join us to learn more about our early career scientist advocates. Sarah PetroskyMultimedia SubcommitteeUniversity of Pittsburgh Research Interest I am interested in understanding adaptation that has been happening recently in populations by dissecting the ways that genes underlying an adaptation…
TAGC 2024 Early Career Award Winners

GSA is pleased to announce the winners of the early career awards presented at The Allied Genetics Conference 2024. These awards are specific to particular TAGC communities and recognize early career scientists’ outstanding work on their respective research organisms. The awardees will present their talks in keynote sessions at TAGC 2024. Don’t miss the opportunity…
Preeminent geneticists recognized with revamped GSA Awards

In 2022, GSA’s Board of Directors launched an audit to review the five major awards conferred by the Society. Today, we are thrilled to announce the recipients of the reimagined GSA Awards, including the new Genetics Society of America Early Career Medal. The scientists honored this year are recognized by their peers for their outstanding…
Fly Board funds outreach programs to spread the word about Drosophila research

In 2020, the Fly Board voted to use part of its reserve fund to support efforts to increase trainee participation as well as equity and diversity in the Drosophila community. An awards committee decides how the money will be spent each year, and from 2020–2022, the committee posted a very broad call for applications from…
New members of the GSA Board of Directors: 2024–2026

We are pleased to announce the election of four new leaders to the GSA Board of Directors: 2024 Vice President/2025 President Brenda Andrews Professor, University of Toronto It’s an honor to continue my association with the Society by serving as Vice President of the Board of Directors. I have broad knowledge of the ongoing activities…