Quantcast
Pexels/Pixabay

Arizona State-led scientists recommend methods to improve statistical inference in population genomics

Population genomics compares the genetic variations in DNA within and between specific biological populations, looking at the influence over time of processes like natural selection, genetic drift and other factors.


Marjorie Hecht
Oct 4, 2022

Population genomics compares the genetic variations in DNA within and between specific biological populations, looking at the influence over time of processes like natural selection, genetic drift and other factors. 

The aim is to infer how evolutionary changes occurred, whether the subject of study is human population growth or viral resistance to a new drug.

Mathematical models are used in the study of population genomics to infer evolutionary processes, reconstruct population changes or measure the rate and function of mutations. At the same time, advances in sequencing technologies have provided an enormous amount of molecular data for use in population genomics.

Recently an eminent group of scientists worldwide, led by Jeffrey Jensen, a professor and researcher at Arizona State University's School of Life Sciences, proposed some ground rules for making sure that the mathematical models are not inaccurate or biased. Their work appears in the journal PLOS Biology, May 31.

`Perils' of current practice

The authors present a consensus view of the "perils" of current computer modeling practice and what needs improvement. 

"[We] argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties."

In brief, these researchers are concerned that some models could reach the wrong conclusions, by starting from a faulty hypothesis and interpreting the data to fit that hypothesis. They call for more rigor.

Best practices

Current Science Daily asked Parul Johri, lead author of the article, to summarize the main points of the group's consensus view of best practices. She is a postdoctoral researcher working with Jeffrey Jensen at Arizona State University.

"We can use genomic variation or variation in DNA sequence data from individuals in a population to make inferences about how the population may have undergone historical changes in size or how certain gene sequences may have been targeted when the population experienced adaptation in the recent past," Johri said. "As sequence data is now becoming abundant such inferences are made much more frequently."

Where do the models get it wrong, and how does the problem get fixed? 

"Currently models of adaptive evolution tend to be fitted to such data often without taking into account many other stochastic [random] processes in effect," Johri said.  "We argue that we should first construct a detailed baseline model that includes constantly operating evolutionary processes in natural populations, and then try to assess if adaptive evolution is statistically distinguishable or discernible from other processes."

Johri elaborated on how this can skew the understanding of evolutionary processes. 

"Using incorrect models can lead to a very incorrect and yet a confident estimation of evolutionary processes," she said. "While it might not matter for some questions, it can matter a lot for others. For instance, when we want to understand the role of adaption in shaping genomic diversity in natural populations."

Recommendations for improvement

The authors make specific recommendations for improving statistical models to produce more accurate inferences of how a population changes. 

"We propose that we should first construct a detailed baseline model that includes variation and uncertainty in the stochastic non-adaptive evolutionary processes that occur constantly in natural populations," Johri said. "We can then try to assess if adaptive evolution is statistically identifiable, allowing us to really ask, `which questions in evolution are addressable?' 

"Finally," she added, "we propose that one should thoroughly evaluate how the evolutionary processes not part of the assumed model would affect inferences from genomic data, so that we are aware of the limits of our inference procedure." 

The group simulated experiments with a genome resembling that of Drosophila melanogaster, the common fruit fly, and modeled different scenarios to show how an inaccurate hypothesis could go wrong in its conclusions.

They also provide a flow chart to help researchers build a better statistical model.

Wider relevance  

The international authors are all experts in population genomics. The project came about, Johri said, out of discussion with her adviser, Jeffrey Jensen. 

"We thought that it would be very important to get feedback, during the development of this project, from other important thinkers in the field," she said. 

However, Johri emphasized, the project isn't limited to population genomics, but can apply to other fields that use statistical inference. 

"The project also demonstrated how biased inference of evolutionary parameters can be when the assumed model is incorrect," she added. "As scientists may commonly make such simplifying assumptions [that hold for some organisms or scenarios but not all], our results are relevant to a wide audience."

__________

Johri, Parul et al. "Recommendations for improving statistical inference in population genomics," PLOS Biology, May 31, DOI: . https://doi.org/10.1371/journal. pbio.3001669 


RECOMMENDED