How bioinformatics can help fill the therapeutic drug pipeline

Written by members of the GSA Early Career Scientist Communication and Outreach Subcommittee: Angel F. Cisneros Caballero, Université Laval; Adelita Mendoza, PhD, Washington University; Narjes Alfuraiji, University of Manchester; Anna Bajur, Max Planck Institute of Molecular Cell Biology and Genetics

During the current global pandemic, public attention is increasingly falling on the process of drug discovery and development. How exactly do we find new treatments? And what does it take to bring them to the clinic? One powerful tool in this process that often escapes notice is bioinformatics—the use of computational resources to answer biological questions.

Exponential increases in computational power have revolutionized the way we do science. Over time, this has created entirely new fields of research, since we can now analyze more data efficiently and explore more complex algorithms and models¹. Bioinformatics is one of the fields made possible by this technological achievement, and it has been critical for many recent scientific advances².

Bioinformatics comprises two interdisciplinary sub-fields that interface with computer science, mathematics, and biology: One is the research and development that scientists need to build the models modern biology requires. The other is computational biology, which is dedicated to understanding basic biological queries.

Bioinformatics is not just an academic field; it has many clinical applications. For example, we now have the technology to sequence genomes and identify genes involved in diseases, such as cancers. However, we can only do it accurately by looking at short segments at a time. Sequencing an organism’s genome becomes like a giant puzzle with thousands of pieces, and only bioinformatic methods allow us to assemble the pieces.

Bioinformatics can also be used to guide drug design experiments and maximize the chances of finding active molecules. This new knowledge can eventually be used to develop therapies and vaccines to save human lives. Here, we will look at some examples of how we can use bioinformatics to discover molecular signposts for particular biological processes. These signs are known as biomarkers, and they are important in all types of clinical research. We will then take a closer look at how bioinformatics can use this information to come up with an application, such as a drug.

Biomarkers of regeneration

Humans do not have the ability to regenerate limbs after amputation, but certain animals have this extraordinary ability, including planarian flatworms and axolotls. To understand these strong regenerative capabilities, scientists study fruit flies, flatworms, axolotls, and zebrafish. These species are powerful model systems to study tissue regeneration after amputation or damage. As in most biological fields, modern-day bioinformatics techniques are playing a key role in understanding how the genome responds to injury.

Regeneration requires a real-time genomic response, which can be studied by looking at which genes are activated or repressed in individual cells with single-cell RNA sequencing. A recent study from Fincher et al. identified flatworm genes that were active after injury by analyzing all messenger RNA (the transcriptome) of individual lineage precursor cells with Drop-seq. This technique isolates single cells in droplets so that they can be separately analyzed and compared. This method is so powerful that researchers were able to detect the transcriptome from cell types with frequencies as low as ~10 cells per animal³.

Bioinformatic analyses allowed the cells to be clustered by gene expression groups in different tissue types, which then allowed researchers to build an atlas of genes expressed in the transcriptome after injury.

In another example, Vizcaya-Molina et al. identified novel enhancers that regulate gene activation during different phases of recovery from injury in developing fruit flies. The researchers looked for accessible regions in the DNA (which are associated with higher gene activation) using a technique called ATAC sequencing. They confirmed that some regions of the transcriptome changed in response to injury, and they then wanted to know if those genes had common functions. With the help of bioinformatic databases, they found that many of those genes belonged to signaling pathways involved in cell growth and differentiation⁴.

A study by Goldman et al. uncovered the genetic regulatory program that responds to injured cardiomyocytes in zebrafish. Inaccessible regions of DNA are tightly wrapped around proteins called histones. They looked at profiles of a replacement histone that indicates transcriptional accessibility, known as H3.3, to uncover gene regulatory elements involved in heart regeneration. This method allowed researchers to identify genes that were upregulated in response to injury. Later, during cardiomyocyte regeneration, they found an enrichment of enhancer elements that were “open” for transcription and then identified the specific sequence involved during regeneration⁵.

These examples show that bioinformatics helps to unlock the mysteries of genes that regulate regeneration after injury. Bioinformatics techniques are applicable to monitoring real-time genomic response in individual cells, probing sections of accessible regions in the DNA in several organisms that are capable of regeneration. The greater computational power that bioinformatics provides will allow scientists to ask new questions that are important to the field of regeneration.

Biomarkers of virulence factors

Bioinformatic tools are also important in finding biomarkers of infectious disease virulence, which can be appealing candidates for drugs. For instance, we can look for specific genes that drive the pathogenicity of a given microorganism, such as yeast. To do this, we can design strains that lack particular genes and evaluate if this makes them less pathogenic. Testing a large number of yeast strains is typically performed using competitive growth methodologies⁶. For example, Han et al. evaluated growth of each mutant strain under controlled conditions of direct competition with other mutants, thus reducing the time and cost associated with screening each one individually. This enabled screening of a large number of strains to identify a drug target.

An example of how functional genomics can be used to identify drug targets in pathogenic fungi has been carried out in Candida albicans with the C. albicans fitness test (CaFT). In this test, each isolate is assigned a unique identifier (barcode) that we can track computationally in order to observe if there were differences in fitness among heterozygote isolates. This enabled the researchers to screen for loss of gene function in the presence of antifungal agents, from which they identified the mechanism of action of novel compounds⁷.

Competitive fitness profiling was also used to evaluate the relative fitness of large pools of A. fumigatus mutants to identify those that are involved in virulence using a non-genetically barcoded library of mutants⁸. As a result, they reduced the total number of animals that are usually required to perform virulence screening. Tn–Seq is another technique used to assess the contribution of genes to fitness in Streptococcus pneumoniae. However, instead of deleting the gene, Tn-Seq inserts additional DNA within the gene⁹.

Similarly, changes in mutant frequency can be used to compare the fitness of the different mutants. By looking at which mutants grow most poorly, scientists can identify which genes are the most essential and consider them as potential drug targets. This is of particular interest in drug discovery programmes, since it is crucial to identify genes that are responsible or involved in pathogenicity to develop and design a novel therapy.

Drug design

Once we have found the optimal drug target, we can turn to bioinformatics again to help us find a drug for it. A classic approach is to generate millions of molecules experimentally, test them, and register the ones that have an effect. However, this method is very time-consuming and resource-intensive, while the number of effective molecules can be low. Instead, we can use our models of molecular interactions to test molecules computationally and only test experimentally the ones that are predicted to be effective. This allows us to narrow down the set of molecules to test in an experiment while maximizing the chance of success. Indeed, Doman et al. showed that computational tests increase the efficiency of these experiments. When they screened a big library of molecules, only 0.02% of their tests were positive. However, when they used a computational analysis to evaluate only the ones predicted to be effective, 35% of their tests were positive¹⁰. Thus, virtual screening saves a considerable amount of time and money by reducing the number of assays yet results in higher efficiency. In fact, there are several examples of drugs found through computational screening that have been approved by the FDA. These include dorzolamide to treat glaucoma, captopril to treat hypertension, and saquinavir to treat HIV¹¹. Moreover, these approaches are being used in the context of the current COVID-19 pandemic to find potential new treatments.

All potential drugs should be subjected to multiple stages of evaluation to assess their safety—first in preclinical tests with model organisms, and then in clinical studies in humans. Despite the promise of computational methods to help identify active molecules, most fail to pass these clinical studies because of unwanted side-effects. Thus, one of the newest endeavors in the field is the use of machine learning to add predictions on how likely a given molecule is to be toxic. Machine learning is a series of tools that find trends in known data to predict the results of future observations¹².

Currently, these methods look at databases of molecules to extract their physical properties and health concerns associated with them. Then, they build models that link those properties to health concerns to derive general rules. These approaches have been very successful, with some models being able to identify toxic compounds with up to 95% accuracy.

Gaining access to greater computational power has allowed us to pursue new questions and develop further techniques to address them. This has had a notable impact on diverse fields, from basic science to applications in the clinic. The future of bioinformatics will certainly be exciting, as it will likely produce more and more results that have an impact on our daily lives.

References:

Edgar, T. W. & Manz, D. O. Research Methods for Cyber Security. (Syngress, 2017).
Gauthier, J., Vincent, A. T., Charette, S. J. & Derome, N. A brief history of bioinformatics. Brief. Bioinform. (2018). doi:10.1093/bib/bby063
Fincher, C. T., Wurtzel, O., de Hoog, T., Kravarik, K. M. & Reddien, P. W. Cell type transcriptome atlas for the planarian Schmidtea mediterranea. Science 360, (2018).
Vizcaya-Molina, E. et al. Damage-responsive elements in Drosophila regeneration. Genome Research 28, 1852–1866 (2018).
Goldman, J. A. et al. Resolving Heart Regeneration by Replacement Histone Profiling. Dev. Cell 40, 392–404.e5 (2017).
Han, T. X., Xu, X.-Y., Zhang, M.-J., Peng, X. & Du, L.-L. Global fitness profiling of fission yeast deletion strains by barcode sequencing. Genome Biol. 11, R60 (2010).
Xu, D. et al. Genome-wide fitness test and mechanism-of-action studies of inhibitory compounds in Candida albicans. PLoS Pathog. 3, e92 (2007).
Macdonald, D. et al. Inducible Cell Fusion Permits Use of Competitive Fitness Profiling in the Human Pathogenic Fungus Aspergillus fumigatus. Antimicrob. Agents Chemother. 63, (2019).
Solaimanpour, S., Sarmiento, F. & Mrázek, J. Tn-seq explorer: a tool for analysis of high-throughput sequencing data of transposon mutant libraries. PLoS One 10, e0126070 (2015).
Doman, T. N. et al. Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J. Med. Chem. 45, 2213–2221 (2002).
Sliwoski, G., Kothiwale, S., Meiler, J. & Lowe, E. W. Computational Methods in Drug Discovery. Pharmacol. Rev. 66, 334–395 (2014).
Yang, H., Sun, L., Li, W., Liu, G. & Tang, Y. In Silico Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts. Front Chem 6, 30 (2018).

The authors:

Adelita Mendoza

Angel F. Cisneros Caballero

Anna Bajur

Narjes Alfuraiji

Bioinformatics, COVID-19, Early Career Scientists

Graduate student and postdoctoral leaders from the Early Career Scientist Committees of the GSA.

View all posts by Early Career Scientist Committees »

Thank you, GSA community!

Thank you for being a member of the Genetics Society of America! As GSA’s current president, I am writing to tell you about Society projects and initiatives that we hope you will find useful in advancing your science and your career. Scientific research is a collaborative and exciting endeavor. Scientific societies like GSA exist to…
Where are they now? Rosalind Franklin Young Investigator Award recipients share updates on their research

Rosalind Franklin Young Investigator Award applications are open–make sure you submit your application or nomination of a colleague by September 30, 2024.
University of Minnesota researchers map genome of the last living wild horse species

The study, published in G3: Genes|Genomes|Genetics, is part of larger conservation efforts to save Przewalski’s horse.
Congratulations to the Spring 2024 DeLill Nasser Awardees!

GSA is pleased to announce the recipients of the DeLill Nasser Award for Professional Development in Genetics for Spring 2024! Given twice a year to graduate students and postdoctoral researchers, DeLill Nasser Awards support attendance at meetings and laboratory courses. The award is named in honor of DeLill Nasser, a long-time GSA supporter and National Science Foundation…
Carolyn Damilola: an NFS Rising Scientist on a lifelong quest to learn more

Carolyn Damilola is an NFS Rising Scientist from Nigeria doing respiratory system research and paving the way for scientists from underrepresented communities through mentorship.
What does a good microgrant proposal look like?

Members of the Microgrant Review Committee share their tips for a successful proposal.
The first piece of the facial recognition puzzle

New research in GENETICS gives a first peek at the molecular pathway involved in recognizing faces.
New Senior Editor Amy MacQueen joins GENETICS

A new senior editor is joining GENETICS in the Genome Integrity and Transmission section. We’re excited to welcome Amy MacQueen to the editorial team.
Block party on the zebrafish sex chromosome

Research in G3 identifies a gene regulatory block of the zebrafish genome responsible for overseeing the maternal-to-zygotic-transition.
Unraveling the mysteries of duckweed: epigenetic insights from Spirodela polyrhiza

Research published in G3 offers insight into the impact of DNA methylation on clonal propagation in asexually reproducing plants.
A microbiologist’s quest to understand CRISPR in bacterial self-defense

2024 Genetics Society of America Medal recipient Luciano Marraffini determined how CRISPR-Cas systems destroy genetic targets with precision, paving the way for gene editing technology development.
Unlocking mysteries of trait and disease heritability in dogs

2024 Edward Novitski Prize recipient Elaine Ostrander, a pioneer of the domestic dog model, discovered numerous genes affecting dog size, morphology, behavior, and disease susceptibility—many of which have relevance in humans.
GSA and collaborators Personal Genetics Education & Dialogue and Reclaiming STEM Institute launch NSF-funded BIO-LEAPS project to support culture change in genetics

We are thrilled to announce that the Genetics Society of America (GSA) is collaborating with the Personal Genetics Education & Dialogue (PGED) based in the Department of Genetics at Harvard Medical School, and the Reclaiming STEM Institute (RSI) on a Leading Culture Change Through Professional Societies of Biology (BIO-LEAPS) grant from the U.S. National Science…
Daman Saluja: Navigating Science and Policy in India

In the Paths to Science Policy series, we talk to individuals who have a passion for science policy and are active in advocacy through their various roles and careers. The series aims to inform and guide early career scientists interested in science policy. This series is brought to you by the GSA Early Career Scientist…
A fly geneticist’s journey into discovering rules of organ development

2024 George W. Beadle Award recipient Deborah Andrew discovered new genes and pathways in Drosophila salivary gland organogenesis. Now, her work can help optimize cell secretion in therapeutic applications and fight malaria.
Małgorzata Gazda: How receiving the DeLill Nasser Award helped her land her dream job

Have you ever experienced an event that changes the course of your life, or in this case, your career? Małgorzata (Gosia) Gazda is Assistant Professor at the University of Montreal and in 2022, she received the DeLill Nasser Award for Professional Development in Genetics, which she used to attend and present at the 2022 Population,…
Hongyu Zhao joins GENETICS as new Senior Editor

A new senior editor is joining GENETICS in the Statistical Genetics and Genomics section. We’re excited to welcome Hongyu Zhao to the editorial team.
GSA Member Julio Molina Pineda Receives DeLill Nasser Award, Shines at TAGC 2024

“At any career stage, the GSA membership is an amazing investment for any genetics professional!” Julio Molina Pineda is a PhD Candidate in Cell and Molecular Biology and a Research Assistant at the University of Arkansas, and a Doctoral Academy Fellow at the Lewis Lab. In 2023, Julio was awarded the DeLill Nasser Award for…
In Memoriam: Ellsworth Herman Grell (1932–2023), a pioneer of Drosophila genome engineering and annotation

Ellsworth (Ed) Grell blessed the Drosophila community through three enduring legacies: as a pioneer of chromosome mechanics, as a primary organizer and synthesizer of genetic knowledge in Drosophila, and as a graceful mentor to those fortunate to have known him personally. Ed grew up in rural Nebraska, completed his undergraduate studies at Iowa State, and…
Congratulations to the #Fungal24 Poster Award winners!

We are pleased to announce the recipients of the GSA Poster Awards for posters presented at the 32nd Fungal Genetics Conference! Undergraduate and graduate student members of GSA were eligible for the awards, and a hard-working team of judges made the determinations. Congratulations to all! Felicia Ebot Ojong, The University of Georgia My research is focused…
Poster presentation tips for TAGC 2024

You’ve been selected to present a poster at The Allied Genetics Conference 2024 in March—you’ve celebrated, made plans to attend, now what? This is an exciting opportunity to showcase your research and engage with fellow members of the genetics community, so you want to make sure you’re prepared. We wanted to offer you some tips…
Maximize your TAGC 2024 experience

A guide to all that National Harbor & DC have to offer Are you joining us for The Allied Genetics Conference 2024 in March? Make the most of your #TAGC24 experience in National Harbor! We know the science will keep you busy, but you deserve to unwind and have some fun, so we’ve curated a…
Early Career Leadership Spotlight: Sarah Petrosky

We’re taking time to get to know the members of the GSA’s Early Career Scientist Committees. Join us to learn more about our early career scientist advocates. Sarah PetroskyMultimedia SubcommitteeUniversity of Pittsburgh Research Interest I am interested in understanding adaptation that has been happening recently in populations by dissecting the ways that genes underlying an adaptation…
TAGC 2024 Early Career Award Winners

GSA is pleased to announce the winners of the early career awards presented at The Allied Genetics Conference 2024. These awards are specific to particular TAGC communities and recognize early career scientists’ outstanding work on their respective research organisms. The awardees will present their talks in keynote sessions at TAGC 2024. Don’t miss the opportunity…
Preeminent geneticists recognized with revamped GSA Awards

In 2022, GSA’s Board of Directors launched an audit to review the five major awards conferred by the Society. Today, we are thrilled to announce the recipients of the reimagined GSA Awards, including the new Genetics Society of America Early Career Medal. The scientists honored this year are recognized by their peers for their outstanding…
Fly Board funds outreach programs to spread the word about Drosophila research

In 2020, the Fly Board voted to use part of its reserve fund to support efforts to increase trainee participation as well as equity and diversity in the Drosophila community. An awards committee decides how the money will be spent each year, and from 2020–2022, the committee posted a very broad call for applications from…
New members of the GSA Board of Directors: 2024–2026

We are pleased to announce the election of four new leaders to the GSA Board of Directors: 2024 Vice President/2025 President Brenda Andrews Professor, University of Toronto It’s an honor to continue my association with the Society by serving as Vice President of the Board of Directors. I have broad knowledge of the ongoing activities…
Why PEQG is the meeting population, evolutionary, and quantitative geneticists can’t miss

What makes the Population, Evolutionary, and Quantitative Genetics (PEQG) Conference so special? For many researchers, it’s the rare chance to gather with experts who work across an incredible range of model systems, approaches, and questions, all while sharing a deep common interest.
Why scientists’ voices matter in Congress: A conversation with Adriana Bankston on the importance of federal research advocacy

Adriana Bankston, a former AAAS-ASGCT Congressional Policy Fellow in the U.S. House of Representatives*, shares how she used her background as a scientist to shape policy during uncertain times. She explains why advocacy matters at every career stage, and how individual voices can make an impact in the U.S. Congress.
A new study highlights the need for considering spatial structure in detecting positive selection

Identifying the signatures of natural selection in a population is tricky. A new simulation-based model investigates how population structure affects our ability to accurately predict signatures of selective sweeps.