Scientists have focused on studying only a very small set of the proteome due to a variety of factors. This inequality has resulted in thousands of proteins being poorly understood.
Scientists have focused on studying only a very small set of the proteome due to a variety of factors. This inequality has resulted in thousands of proteins being poorly understood.
An international group of scientists aims to change the discrepancy by calling attention to this inequality, surveying relevant researchers in the field and then holding a conference to discuss how to remedy the situation.
Their assessment of understudied proteins appears in the journal Nature Methods May 9, along with an "Open invitation to the Understudied Proteins Initiative."
The researchers characterize the inequality in protein research as "massive." They write that "95% of all life science publications focus on a group of 5,000 particularly well-studied human proteins."
Further they note that of the 3,000 or so proteins expected to be useful for drugs that fight disease, only 5% to 10% are currently targeted by drugs approved by the U.S. Food and Drug Administration.
Why such inequality?
The researchers spell out many reasons for the inequality. First is a practical reason. Can the protein easily be studied with available experimental tools? For example, many understudied proteins have a small size, and are, thus poorly, detected with current technology.
Second, is the issue of risk. It's easier to get funding for protein research for already studied proteins because funding and peer-review systems tend to be "risk averse."
Additionally, research in a well-studied area "enhances the likelihood of being cited, and, consequently, also increases the possibility for high-impact journal publications, which are required for academic success."
There also may be "conceptual biases" in the research system, the authors note, not related to the protein characteristics. The most-studied proteins may be thought to be more important. Or, perhaps, standardized experimental conditions, which may make the research more reproducible, limit the choice of research material.
Finally, the researchers state, is that focus on hypothesis-driven research may also contribute to the inequality. How is one to formulate hypotheses on the function of an uncharacterized protein?
The authors present some solutions to the problem, noting that "proteomics [the large-scale study of proteins] is rapidly increasing in throughput, with methods emerging that allow for hundreds of proteomes to be recorded per day on a single mass spectrometer." Other advances, such as improvement in protein structure prediction, are also noted.
The Understudied Proteins Initiative
Juri Rappsilber, a senior author of the paper, is a professor of proteomics at the University of Edinburgh and professor of bioanalytics in the Technical University in Berlin. He described what the Understudied Proteins Initiative aims to achieve.
"The lack of prior knowledge on a protein is a stumbling block on getting the protein studied. This is what we want to address," he said. "We want to provide initial characterization on the function of many proteins--enough, such that researchers can spot those poorly studied proteins that fall into their specific interest spheres and study them in greater detail."
Rappsilber stressed that the current inequalities in protein research are due to the different levels of attention that scientists give to different proteins.
"This is for a variety of reasons, including how large a protein is, how well it expresses, and how abundant it is," he said. "But it also depends on how easy one can publish on it or receive funding.
"The latter two points link to how much is known already about the protein," he added. "We know very little about proteins that nobody works on and, hence, we cannot phrase credible research questions about them that lead to funding and further knowledge."
Wellcome Trust - a key driving force
Rappsilber highlighted the key role of the Wellcome Trust, a London-based charitable foundation focused on health research, in driving the initiative project forward.
The Wellcome Trust, he said, recognized that "many proteins involved in rare diseases (which are not rare collectively), or in host-pathogen interactions are poorly understood." When he read "a paper led by Georg Kustatscher and me, on using protein co-expression to delineate proteins that act together on a proteome scale in human cells, Tom Collins of the Wellcome Trust contacted me about getting an initiative going."
Rappsilber continued: "Our work is not the only approach that may help in providing basic functional annotation to large numbers of proteins. So we collected an initial team of scientists who are active in this larger area, representing entities such as the Human Protein Atlas and the Human Proteome Organization as well as different geographic regions, to get the ball rolling. Importantly, this is an open initiative and one of our next goals is organizing a meeting for interested scientists to discuss the best route forward."
An encouraging response, but more scientists needed
Since the open invitation to join the initiative appeared in Nature Methods May 9, Rappsilber has been encouraged by the response.
"We were contacted by numerous scientists who share an interest in understudied proteins," he said. "This is an important achievement as it broadens the base of the initiative and sets a stage for coordinating, comparing, and discussing approaches. We need to widen attention beyond this specific set of scientists."
Rappsilber urged fellow scientists to participate, first by filling out the initiative's questionnaire, which aims to find out "what data make a protein feasible for further research."
"For this, we require as many scientists as possible to participate in our online questionnaire," Rappsilber said. "There they will be presented with a random human protein and should assess if this protein is in their eyes well studied or not.
"We will integrate all these ‘votes’ by machine learning and data sciences to delineate contributions of different data and build an automated assessment that will allow monitoring progress of our venture," he added. "This is essential to define a clear goal, devise a credible approach, secure the necessary funding, and ultimately solve the understudied protein challenge."
Rappsilber's message to those in the field is, "We need you to participate with five minutes of your time, and to tell at least two of your colleagues to do likewise. This is a citizens' science project for scientists."
The article concludes, "This protein function moonshot may also stimulate methodological developments in functional proteomics and may extend to other species."
“This initiative will open up thousands of medically relevant proteins for research," Rappsilber said. "When the human genome project laid the foundation for knowing what proteins exist and AlphaFold for how they look, this initiative will deliver a foundation for what these proteins do.”
-------
G. Kustatscher, et al. Understudied proteins: Opportunities and challenges for functional proteomics. Nature Methods, May 9, 2022.