An international group of biomedical researchers has developed a method of mining genetic information from multiple electronic medical records without compromising patient privacy.
An international group of biomedical researchers has developed a method of mining genetic information from multiple electronic medical records without compromising patient privacy.
Understanding the specifics of the genetics underlying disease is important for diagnosis, prevention and treatment. These genetic data can help with evaluating risk factors and targeting of genes with new drugs.
The new statistical method is called Sum-Share, short for SUMmary Statistics from multiple electronic HeAlth Records for plEiotropy. Initial simulation runs with Electronic Health Record (EHR) data have shown that Sum-Share has greater statistical power compared with the standard methods of analyzing data from EHRs.
The Sum-Share research was published Jan. 8 in Nature Communications.
Sum-Share and other traditional types of analysis such as PheWAS (the abbreviation for phenome-wide association study) are looking at the genetics of complex diseases, such as cardiovascular diseases. In particular, Sum-Share promises to be effective in extracting information about genes that may produce one or more unrelated effects, a phenomenon called pleiotropy.
Although large data sets are available in the Electronic Medical Records and Genomics (eMERGE) project and the nationwide Biobank in the United Kingdom, patient privacy has to be protected. Sum-Share protects privacy by analyzing summary statistics from different regional or institutional sources, instead of accessing individuals' genetic data. The information about the individuals involved stays within each institution's EHRs.
Their test runs showed that Sum-Share results using summary statistics performed better than traditional methods working with individual-level data. For example, the authors demonstrate that the Sum-Share method was able to detect pleiotropy from data sets with "greater power than PheWAS in all settings..."
The researchers looked at pleiotropic effects in the common diseases associated with cardiovascular diseases such as obesity, hypothyroidism, Type 2 diabetes, hypercholesterolemia and hyperlipidemia in eight different regional EHRs. In one test run, Sum-Share was able to identify 1,734 significant DNA sequences associated with genes involved in cardiovascular disease, compared to only one found in looking individually at each site.
Their test runs were designed to show that Sum-Share results using summary statistics were not produced by random chance. The Sum-Share method was also able to adjust for different ages for each disease.
The researchers note that their Sum-Share algorithm has some limitations, such as dealing with different ethnic populations. Their test runs only analyzed patients of European descent to avoid errors in analysis. They also note that there is still some risk of breaching privacy when using "smaller datasets or for rare events." However, the researchers conclude that "Sum-Share would have a significantly reduced risk of exposing the patients' privacy."