Quantcast
Photo courtesy CDC/Janice Haney Carr

AlphaFold drastically improves artificial intelligence-predicted protein structures

An artificial intelligence system known as AlphaFold has enabled an unprecedented expansion of the knowledge of the structure of tens of thousands of previously unmapped proteins, based merely on their amino acid sequences.


Laurence Hecht
Mar 30, 2022

An artificial intelligence system known as AlphaFold has enabled an unprecedented expansion of the knowledge of the structure of tens of thousands of previously unmapped proteins, based merely on their amino acid sequences. 

Proteins are long chains of simpler molecules known as amino acids. But it isn’t enough to know only the sequence of amino acids to know what the protein looks like. The vast number of ways these chains can twist and fold introduces a great many more possibilities.

Previously, determining the shape or conformation of a protein has been a time-consuming process involving use of X-ray crystallography, nuclear magnetic resonance, or cryogenic electron microscopy. As a result the number of well-characterized protein structures is only a fraction of the hundreds of millions of known proteins. 

All that has changed with the development of the AlphaFold Protein Structure Database, an openly accessible database of high accuracy protein structure predictions. 

A report on the initial public release of the database appeared January in the journal Nucleic Acids Research. The work is a collaboration of 27 scientists from the European Molecular Biology Laboratory in Hinxton, United Kingdom, and DeepMind Technologies, a British artificial intelligence research laboratory owned by Alphabet Inc. 

Protein shape and function

Proteins are the workhorses of the cell, performing tasks ranging from catalyzing chemical reactions to building and copying DNA. Knowing what a protein looks like can reveal much about how the protein functions.

There are four levels of structure to a protein, known as its conformation. Because structure determination had been such a labor-intensive process, a significant gap existed between known protein structures and structures scientists would like to know. 

AlphaFold has drastically improved this process by producing much higher quality structural predictions. The new database provides access to the predicted atomic coordinates and interactive visualization. 

100 million structures in 2022

The first release, described in the November 2021 paper, provides free access to more than 360,000 predicted protein structures across 21 different organisms. These include the human proteome, the mouse, rat, fruit fly, zebra fish, nematode worm, E. coli, Staphyloococcus aureus and Mycobacterium turberculosis bacteria, as well as soybeans, maize, rice and yeast proteomes. 

By contrast it has taken since the 1950s to determine a bit more than 180,000 structures of proteins and nucleic acids and to archive them by using conventional methods.

The authors of the paper predict that in 2022 the AlphaFold database will grow to greater than 100 million structures. 

A first update to the database will provide structural predictions covering greater than 1 million proteins. “This will be followed by another update in 2022 to include structures for most representative sequences from the UniRef90 data set [>100 million structures],” the authors forecast.

–––––––––––––

Mihaly Varady et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research (2022). DOI: https://doi.org/10.1093/nar/gkab1061


RECOMMENDED