Today’s guest post was contributed by Caitlan Rossi, a scientific and medical writer. Her work can be found at caitlanrossi.com.
Sudhir Kumar, Laura H. Carnell Professor of Biology and Director of the Institute for Genomics and Evolutionary Medicine at Temple University, wants to make universal participation in evolutionary genetics possible. As a graduate student, his interest in comparative analysis of molecular sequences from an evolutionary perspective and his aptitude for computation led him to develop a new software called Molecular Evolutionary Genetics Analysis (MEGA), a user-friendly platform to conduct evolutionary analyses using sophisticated methods and models. His group later created the online database TimeTree, presenting a comprehensive view of the vast evolutionary knowledge of species divergence time. “It’s one thing to have novel approaches,” said Kumar, explaining the importance of disseminating knowledgebases with an approachable interface, “…but we need a way to use them. Both must be made hand in hand for effective and forward-looking science.”
MEGA by word of mouth
A student at Penn State in 1991, Kumar wasn’t yet thinking about disseminating tools to make science accessible to all. However, he had started to explore a better way to perform comparative analysis of molecular sequences, drafting what would become the earliest iteration of MEGA. “As a graduate student, you’re not thinking about scientific enterprise for all!” Kumar said. “It was a personal project for personal use. The interface challenge was basically, can I use it?” The most important design principle was that the tool be easy to operate—that a user could visualize the series of sequences they had input, analyze the genetic distance they had computed, and ultimately use the phylogeny they had created. With the help of his collaborators, Kumar began to distribute the software (at this stage on a floppy disk) to other researchers. “We used the snail mail, stuffing an envelope with a diskette and a manual!” laughed Kumar, joking that sometimes it felt more challenging to distribute MEGA than to build it. But science is a community, and news of the novel tool spread quickly by word of mouth: by 1998, almost 2,500 copies of MEGA had been shared.
Only the strong software survive
Inspired by the software’s growing following and the clear need for this type of solution, Kumar began working on improvements. MEGA 2, developed in 2001, utilized more memory and offered better data handling capabilities with new methods and increased efficiency. Today, MEGA is in its 12th major release, on track to be downloaded 400,000 times this year—with 300,000 downloads by students and another 100,000 by researchers. Last year alone, the software garnered 18,000 citations. In true Darwinist fashion, MEGA has evolved over time to include new technological advances. “Tools like MEGA, which have been modernized constantly to leverage computing resources better and better, have survived the test of time,” said Kumar. The current iteration of MEGA allows users to build sequence alignments, explore data and results interactively, infer evolutionary trees and ancestral sequences, estimate distances and diversity, compute timetrees, and test for selection.
Surprising applications and adaptations
Indeed, MEGA continues to take its own adaptive walk to become better suited to the scientific landscape. While Kumar created the software to help analyze data from humans, mammals, and their backboned relatives, the rise of genomics, sequencing, and big data has rendered the tool increasingly valuable across diverse species. “A lot of biomedical fields use it more than I ever imagined. I think it speaks volumes about the importance of evolutionary genetics and GSA’s work to promote it,” said Kumar. Beyond the molecular evolution of animals, MEGA has been used extensively in fields spanning virology, bacteriology, environmental science, systematics, developmental and pathogen evolution, as well as microbiology, plant biology, and conservation biology. It has been applied to study HIV/AIDS, cancer and tumor variation, and recently, coronavirus.
Kumar explained, “This is the importance of evolutionary genetics and genomics in the world of science: informing our understanding of the ultimate causes of modern disease, what I call phylomedicine—how phylogeny or genetic variation informs medicine and disease as treatment and intervention—because we can ultimately learn so much from genomes, ours and pathogens’, and we can use tools like MEGA to isolate and apply that knowledge. Our investigations are becoming more and more relevant and more translational as time progresses, so that’s a wonderful thing to see—to make some contribution that has a lasting impact.”
A tool for scientists and students
Building on the success of MEGA, Kumar and colleagues developed TimeTree to widely disseminate the growing knowledge derived from molecular dating. In the genetic practice of putting a timestamp on ancient events, different researchers publish various estimates, and this spectrum of data can be challenging to access in the scientific literature. Kumar saw the value of a database that could hold these publications collectively, where a user could query any species pair and quickly yield results without having to pore over the literature specific to a particular domain. The solution speaks to his commitment to breaking down barriers to scientific inquiry. “We wanted to break that boundary between scientific knowledge locked up in all these published articles and other scientists, non-experts in the field, and all the people on earth with curiosity and eagerness to learn,” said Kumar. As he explained, TimeTree is a platform where a postdoc might go to find a synthesis of data on species divergence or where a high school biology student might ask, “When did I share a common ancestor with a tomato, based on genetics?” A tool for open science, TimeTree uses common names and species names, extending evolutionary knowledge to everyone, including non-geneticists.
A commitment to green computing
MEGA was created at a time when memory and processing capacity were limited. Today, we have the luxury of faster computers and increased memory, but we also have a staggering amount of data to sift through. The concept of green computing arose with the need for environmentally sustainable computing practices that limit energy consumption. “The need to analyze the data is outpacing the speed of computers and the amount of computing available, so it’s actually even more relevant today to have faster and accurate programs if we want to make research enterprise accessible to all,” Kumar explained.
To Kumar, green computing is not only the best way to conduct sustainable science, but also to even the playing field between juggernaut research institutions and equally interested but lesser-resourced scientists and students. “It’s difficult for most people to participate in the big data science revolution,” he said. “One way to break that chain is to develop fast methods that require less memory, can handle big data, and produce results comparable in accuracy to memory-hungry, time-hungry methods.” Kumar argues that making genomics more accessible by developing community resources—the greener, the better—is one of the most worthwhile goals a computational geneticist can pursue. “It will not only help us re-analyze big data but increase scientific rigor,” he said. “Leverage data brain power to have more people involved in research. To me, it’s a no-brainer that we have to be greener.”
Join us in congratulating Sudhir Kumar on receiving the George W. Beadle Award for his extensive scientific contributions and outstanding efforts to make evolutionary genetics accessible to all.