Designing new enzymes to optimize their features for performing roles in research, biology and biomedicine has outstanding promise for advancing scientific capabilities.
Designing new enzymes to optimize their features for performing roles in research, biology and biomedicine has outstanding promise for advancing scientific capabilities.
But it's not an easy task.
The new creation has to be stable and efficient, without being hijacked by unpredictable genetic mutations that dilute its activity.
A group of researchers at the Weizmann Institute of Science in Israel has developed a method, using artificial intelligence, to produce thousands of new enzymes that are functionally optimized. The AI programs they created allow them to process a massive number of possibilities to optimize the characteristics of the resulting product.
The researchers tested and refined their computational method to produce a new library of structurally diverse enzymes. Their AI programs also can be applied to other enzyme and protein families. They make the algorithms and calculations freely available to other researchers.
Their work appears in Science magazine, Jan. 12.
Combining fragments
The authors explain their method by using an analogy from electrical engineering, where innovation results from combining modular parts that already exist to create a new device. They note that immune system antibodies are able to do this, using bits of genetic fragments to create billions of new binding proteins to use against pathogens. Their study asked similarly "whether enzymes could be generated from combinable fragments."
The researchers began with endoxylanases, one of the glycoside hydrolase family of enzymes that breaks down polysaccharides in plant biomass. To find the most useful potential fragments, they developed a method called CADENZ, combinatorial assembly and design of enzymes.
At its core, CADENZ employs a machine-learning method the researchers developed that they called EpiNNet, for epistasis neural network. EpiNNet sorts through millions of different combinations of fragments to select a small set that can be freely combined, similar to the fragments that generate our body’s antibodies.
They then tested their predictions experimentally using activity-based protein profiling (ABPP) to rank the resulting protein activities. Next they designed and applied a second machine-learning program to determine what molecular properties were common to the enzymes that were highly ranked by ABPP.
With these learned properties, the researchers designed a “second-generation library that demonstrated an order of magnitude increase in the success rate of obtaining functional enzymes."
Screenshot from the Weizmann Institute's Laboratory for Protein Design video./ Laboratory for Protein Design
An Interview with Sarel Fleishman
Using AI to create functional new enzymes and proteins for specific purposes
Dr. Sarel Fleishman, senior author, heads the Laboratory for Protein Design at the Weizmann Institute of Science in Israel, where the mission is to "understand how proteins carry out their exquisite molecular functions." He spoke to Current Science Daily about his laboratory's work with combinatorial assembly and design of enzymes.
What are the problems facing researchers who are trying to design new proteins, and why is it important to find a more efficient way of doing this?
The primary problem in design is how to improve or change the molecular activity of a protein. There are myriads of natural proteins, including enzymes and binding proteins, that could contribute dramatically to human health or industry; for instance by eliminating disease-causing bacteria and viruses, breaking down toxic compounds or synthesizing beneficial ones.
But natural proteins are almost always far from optimal for human needs. They typically exhibit low stability and activity, and we need to find an effective and general way to improve these critical properties by mutating them. The problem is that proteins form intricate three-dimensional structures, and the impact of mutation on their structure and activity is often unpredictable.
Our goal in this project was to develop a way to massively sample sequences and structures of enzymes in order to change their activity profile.
How does epistasis make it difficult to create novel enzymes?
Epistasis is a widely observed phenomenon in which the impact of mutations on protein activity is not predictable. For instance there are approaches to probe the effects of every single mutation in a protein. These are powerful (though laborious).
But a critical problem is that many mutations that are individually beneficial destroy function in combination, and vice versa: Mutations that are individually destructive can combine to alter activity in beneficial ways.
Epistasis is the most significant constraint on our ability to engineer or design new activities. Of course, it is also a major constraint on natural evolutionary processes which are therefore slow to respond to changes in the environment.
How did you use machine learning to help generate structurally diverse stable enzymes from fragments?
Machine-learning methods are remarkably effective in finding patterns in complex, multidimensional data. In our work we used an atomistic strategy called Rosetta to generate models of combinations of all the designed fragments and we computed the stability and the likelihood that each such combination would be functional.
We wanted to focus our experiments on the designed combinations that were most likely to be active. Therefore we looked for a small but optimal set of fragments. Optimality in this case is the likelihood that any combination of fragments in this small set is stable and functional.
Rosalie Lipsh-Sokolik, the graduate student who is the lead author, tested many alternative computational approaches, some simple and some quite complex, but in the end, the machine learning approach, which we called EpiNNet for epistasis neural network, was most effective, efficient and scalable. In fact we’re now using it to design libraries of billions of proteins.
How does CADENZ work, and what is the role of EpiNNet in improving its results?
CADENZ stands for Combinatorial Assembly and Design of Enzymes. We start with a set of naturally occurring protein structures that carry out a similar though not identical molecular activity. We then break these structures into their constituent fragments along points that allow, in principle, to recombine the fragments between different proteins.
But combining natural pieces produces a very high level of molecular strain due to epistasis. The fragments are each compatible with a different molecular context but almost never with one another. To alleviate this strain, we design the fragments, increasing their stability, and then use the EpiNNet approach to select fragments that are minimally strained in combination with one another.
EpiNNet’s impact on the results was astounding. In the very first experimental screen Rosalie carried out, she isolated more than 3,000 active enzymes compared to a few dozen active enzymes that are typically found in an engineering or design study.
But this was only the beginning. We knew that our modeling approach had limitations and were certain that the rich experimental dataset would enable machine learning to uncover these limitations.
Indeed, by applying machine-learning methods, Rosalie uncovered features that correlated with active enzymes, and with this understanding in hand, we improved CADENZ such that in a second screen it produced more than 12,000 enzymes. Because the second screen was more focused than the first, this translated to a tenfold improvement in success rate through one cycle of learning, demonstrating the huge potential of combining advanced atomistic design, experimental screening, and machine-learning methods.
What were your main findings?
Our main finding is an effective and general approach to address epistasis. By addressing epistasis, we can access a huge space of functional proteins that are almost completely inaccessible to protein-engineering approaches or to natural evolution.
In addition we found an effective way to learn how to design improved protein libraries. CADENZ enables a design-build-test-and-learn cycle that is key to advances in engineering but is quite difficult to implement in proteins because of the unpredictability that epistasis imposes.
You used endoxylanases with CADENZ. Could another protein family be used?
Yes, there are thousands of enzyme families that exhibit the structural modularity seen in the endoxylanases that were the targets of our study.
There are also many binding proteins, not least antibodies, that are structurally modular. More generally still, the core innovation in CADENZ, the EpiNNet strategy of identifying compatible mutations, can be applied to any protein, not just structurally modular ones.
In a paper that will be published soon, we demonstrate how EpiNNet can generate thousands of functional mutants in the active site of another protein and that many of these mutants exhibit dramatic and useful changes in the protein’s functional properties. We believe that by addressing epistasis, CADENZ and EpiNNet provide a way to very rapidly and effectively generate huge spaces of functional proteins starting from essentially any natural protein.
How do you think others will be able to use your work?
There are numerous problems in research and biotechnology where superior protein activities are desired. For instance, optimal affinity or specificity of a therapeutic antibody or improved catalytic efficiency and product profile of an enzyme, and I am certain that CADENZ and EpiNNet could be used in those areas.
What are your next steps?
There are so many opportunities that we were always interested in but did not have the proper tools! In the next few years, our focus will be on therapeutic antibodies, how to make stable, high-affinity, human antibodies efficiently and economically, and on enzymes for sustainability research. Endoxylanases were just one example.
Together with our collaborators, we are looking at enzymes that could be used to oxidize or alter the biological activities of small molecules. Such enzymes could be extremely useful in the chemical and biomedical industries.
Just as important, we are working to make our methods accessible to all scientists, because the number of applications is far greater than what we could handle ourselves. The source code for running EpiNNet and CADENZ is published as part of our paper, and we are working on web accessible versions that would make it even simpler for non-experts to apply them.
__________________
R. Lipsh-Sokolik et al. "Combinatorial assembly and design of enzymes." Science, Jan. 12.