GenAI ushers in a new era of drug research

The use of generative artificial intelligence in protein design stands to revolutionize new drug development. EPFL ambitions putting together a consortium to further explore this avenue.
Spike. © Emphase/EPFL

All living things are made of proteins. They play a key role in cell structure, nourishment and health, as well as in drug-body interactions.

Recent advancements in protein design stand to usher in a new era of drug research. At the forefront of this revolution is generative artificial intelligence (GenAI), which is capable of designing entirely new kinds of proteins. New imaging methods such as X-ray crystallography and cryogenic electron microscopy are also playing a key role, as they let scientists observe the composition of real-world proteins with unprecedented precision. Combining these new technologies could pave the way to novel processes, allowing researchers to develop – among other things – innovative biologic medications, often called biologics.

A close-up view of biomolecules

Our current understanding of how proteins and cells interact is based on empirical data gathered through years of biomedical research. To take just one example, we know full well the role that insulin plays in glucose metabolism. But countless other protein-cell interactions remain a mystery – as do the reasons and mechanisms behind disease-causing protein malfunctions.

The emergence of new methods and technologies is expanding the body of scientific knowledge at an exponential pace. Cryogenic electron microscopy – a method practiced and developed at the EPFL-UNIL Dubochet Center for Imaging – allowed researchers to observe, in vitro, how the spike protein on the SARS-CoV-2 Omicron variant interacted with receptors on the surface of human cells, offering insights into both the rapid spread of the virus within the body and its immunity to vaccines developed for previous variants.

Spike

© Emphase / EPFL

Known for being the spearhead that enables the SARS-CoV-2 virus to penetrate human cells, the spike protein rose to fame during the pandemic. The spike protein binds to ACE2 proteins on specific cell membranes (including those in our respiratory system), opening the door for the virus to enter. It consists of three identical chains that protrude from the viral envelope. The spike protein is a glycoprotein – that is, it’s coated with sugars that happen to be of human origin. If the sugar coating is thick enough, it acts as an “invisible cape,” making the virus undetectable by our immune system.

The spike protein is a prime target for our immune system when fighting off an infection – and vaccines are a powerful ally in this fight. Scientists used a variety of methods to develop SARS-CoV-2 vaccines. One involved synthesizing and then purifying the virus’s spike protein, which was then deposited on nanoparticles and administered via subcutaneous injections. The vaccine prompts the recipient’s immune system to produce antibodies, since the spike protein is recognized as a foreign substance. In the case of mRNA vaccines, it’s not a replica of the spike protein that is administered, but rather the protein’s “blueprint” in the form of mRNA. This enables the recipient’s own cells to synthesize the spike protein, against which the immune system develops specific antibodies.

Applying deep learning to life

Similarly rapid progress is being made in another field: the application of machine learning to life sciences. The 2024 Nobel Prize in Chemistry winners were David Baker, an American pioneer of computational biology, along with Demis Hassabis – an EPFL Doctor Honoris Causa laureate – and John M. Jumper, who together developed AlphaFold, a multi-award-winning benchmark AI model for predicting the structure of molecules.

Designing new biomolecules

EPFL is also highly active in protein design. For more than five years now, the School’s Laboratory of Protein Design & Immunoengineering, which is headed by Bruno Correia, has been using machine learning to predict the interactive potential between proteins and their receptors. “The use of deep learning in biological engineering is opening up exciting new opportunities,” says Correia.

While this groundbreaking work is furthering our understanding of how living organisms function, it also marks the starting point for a nascent revolution in drug research. Because when GenAI programs such as ChatGPT are trained on protein and molecular-interaction data generated by researchers and models such as AlphaFold, the programs can design and model entirely new types of molecules, in countless forms, and simulate their interactions with cells. And the programs can perform billions of such calculations per second until they find molecules with theoretical relevance for drug development. “This new approach will be nothing short of a paradigm shift for the entire field of biotechnology,” adds Correia.

«This new approach will be nothing short of a paradigm shift for the entire field of biotechnology.»      Bruno Correia

From planning to reality

There are, however, various ways of producing existing or hitherto unknown proteins on demand. That’s what Florence Pojer and her research group are doing at EPFL’s Protein Production and Structure Core Facility (PTPSP), where bottles containing reddish liquids are shaken in glass cabinets for hours on end. “For instance, these bottles contain human embryonic kidney (HEK) cells, which have been immortalized and cultured for decades,” says Pojer. “We use them to make proteins such as antibodies, after first transfecting the cells with plasmids containing the desired sequence.”

Scientists at PTPSP create other types of cellular and bacterial mixtures too, depending on the results they’re aiming to achieve. The final solution is then purified in order to isolate the target proteins. “In theory, it’s possible to produce any protein from its genetic sequence,” she adds. “But as things currently stand, only a tiny fraction of proteins designed in silico, by computers, can actually be made and function in the real world. The idea behind novel biotechnology approaches is to expand the range of what we can produce in the future.”

Much of this innovative technology is being applied or developed at EPFL – not just by Correia and his research group, but also at the lab headed by Sebastian Maerkl, where researchers are focusing not on biological processes in living cells but instead on in vitro research, using the 30 or so enzymes actually needed for protein production. Meanwhile, Matteo Dal Peraro’s research group is using observation, modeling and simulation to study large macromolecular systems and their action capabilities, which are determined by their structure and composition.

A vast consortium in the making

Various complementary research projects are currently under way at schools and universities across Switzerland. At EPFL, Correia and Beat Fierz are building a consortium with a view to ushering in a new era of drug research – one powered by machine learning. Bringing this under one roof would not only cement the country’s position as a center of excellence in this field, but also encourage the rapid emergence of effective new proteins for clinical applications. The idea is to promote the development of AI-enabled molecule design technology, explore new types of drug-cell interactions, create new databases to further improve the performance of design software, and prepare early-career scientists to seize new research and technology transfer opportunities. It’s an ambitious endeavor that’s sure to captivate scientists for generations to come.

AlphaFold, owned by Google DeepMind, is an AI model that uses a protein’s amino acid sequence to predict the way it’s folded – a structural factor that determines both its function and its ability to interact with its surroundings. AlphaFold 3, the latest version, which was released in May 2024, can even model the structure and interactions of DNA and RNA strands, allowing researchers to identify the exact cellular mechanisms that play a key role in new drug development. It was released as an open-source model, allowing scientists around the world to use it to develop new therapeutic compounds.