Why is genome data sharing blocked

Genome data sharing has pushed biomedical research to the fast track. However, the existing data guidelines issued to the public domain, on the one hand, recognize the importance of free and unconditional use of data, and on the other hand, fail to solve the relationship between this importance and the ” right” of data producers to release data for the first time.

According to Nikos Kyrpides, head of the joint genomics institute of the U.S. department of energy, this contradiction has led to different interpretations and ongoing debates between data producers and data users on the use of public data.

” The root cause is the lack of clear guidelines for the use of data.” In an interview with China Science, Kyrpides stressed again that public data should be regarded as open resources and used for analysis, interpretation and publication without restriction.

Relevant papers were recently published online in Science.

From time to time meet with ” soft obstacles”

The free use of public genomic data is a tradition and consensus in the field of international life science research. Since the implementation of the Human Genome Project, a large number of open and shared genomic data information has greatly promoted the progress of biomedical research.

The Human Genome Project started in 1990 with China’s participation is regarded as a great project in the history of science. Three years ago, Eric Green, James Watson and Francis Collins, who are in charge of the project, wrote an article in Nature summarizing six experiences of the Human Genome Project, one of which is the maximization of data sharing.

It was the Human Genome Project that changed the principle of data sharing in biomedical research and led to the Bermuda principle of 1996, which agreed to submit genome sequencing data exceeding a certain scale to a public database within 24 hours after it was generated.

For a long time, the promotion of data sharing has continued and new changes have taken place. The Fort Lauderdale Agreement of 2003 reaffirmed and expanded Bermuda’s principle, believing that the pre-release of large-scale genome sequence data would be of great benefit to the scientific community, while pointing out that data sharing is limited to group resource projects.

Since the signing of the agreement, the realization of wider, faster and more effective data sharing has become the subject of repeated discussions in academic circles.

Under the background of data sharing, when academic papers are published, relevant genomic data are generally published and shared. ” However, the genome data produced by various government-funded scientific research projects are much more extensive, and the degree of sharing before the paper is published is extremely low.” Zhang Guoqing, a researcher at the Max Planck Institute for Partnership in Computational Biology, Chinese Academy of Sciences, told China Science.

When Zhang Guoqing used international genome data, it was required to fill in the application, ” but due to the opaque audit mechanism, it encountered” soft obstacles ” from time to time.”

” The data sharing policy is not static. Many funding agencies have already made fine adjustments to the policy.” Kyrpides said, for example, the genome data sharing policy formulated by the National Institutes of Health in 2014 is creating a more perfect data sharing ecosystem, ” which is not available in previous agreements”.

” Isn’t this a contradiction?”

The development of the situation ” proves that the Fort Lauderdale agreement is outdated and needs to be revised to reflect the current state of science and technology.” Kyrpides believes that the agreement is usually limited to good team resources projects, but does not include all sequencing projects.

In an interview with China Science, Kyrpides also pointed out the contradictions in the Fort Lauderdale agreement.

According to the agreement, data released to the public domain should be and can be used by anyone without any restrictions, and it is stipulated that these data should be released before publication so as to benefit the whole group.

Over the years, gene sequencing has produced numerous data sets, many of which have been published without publication.

However, the agreement also mentions that ” anyone who wants to use unpublished public data should first obtain permission from the data producer,” Kyrpides said. ” Isn’t this a contradiction?”

Researchers also mentioned that people who favor restricting the use of public genomic data usually have two reasons. One is that unverified pre-released data may contain errors, and the other is that it often takes a long time to generate new data.

In Zhang Guoqing’s view, the main reason for the limited use of data is that the relevant rights and interests of the data are not clear and it is difficult to ensure the interests of all parties such as sample provision, data output, data management and data analysis.

In addition, the unclear security management requirements for personal information related to genomic data are also one of the reasons, such as sensitive data.

” We acknowledge that some restrictions may be appropriate for the existing sensitive human genetic data.” Kyrpides also said.

However, researchers found that resistance to sharing sensitive data was gradually easing. Looking at the entire biomedical literature, from 2015 to 2017, about 1 / 5 of the published articles shared the original data, a significant increase over previous years.

Determine the principle of use

” unrestricted use of public data should be consistent with the academic reward system.” Kyrpides believes that funding agencies need to recognize the significance of data sharing and give appropriate honors to the scientists who generate the data.

It is also important to ” determine effective methods to support the generation of protocols and specific data sets after describing data generation.” Kyrpides told reporters that it is necessary to re-examine the data release strategies of funding agencies and periodical publishers.

Researchers believe that journal publishers need to reconsider their publishing policies, that is, the availability of data when manuscripts are submitted for publication. Kyrpides et al. suggested that the sequence data and its associated metadata should be provided free of charge with the detailed agreement when the manuscript is submitted for peer review, rather than after publication.

” To promote the development of genomics, we need to formulate strong policies, promote open and unrestricted data sharing, and promote inclusive group-driven research and training.