A Seattle researcher retrieved data on the genetic sequence of SARS-CoV-2 that had been deleted from a National Institutes of Health database.

By Joseph Guzman
Story at a glance
Jesse Bloom, a computational biologist at the Fred Hutchinson Cancer Research Center, was able to retrieve the data through Google Cloud.
The researcher says the data bolsters evidence that the coronavirus was circulating in Wuhan, China, before a December outbreak linked to a wet market in the city.
His discovery, however, does not draw any conclusions about whether the virus spread naturally from animals to humans or if it was the result of a laboratory leak.
A Seattle researcher says he’s retrieved data on genetic sequences of the SARS-CoV-2 virus that were deliberately deleted from a National Institutes of Health (NIH) database, fueling the discussion about the mysterious origin of the COVID-19 pandemic.
Jesse Bloom, a computational biologist at the Fred Hutchinson Cancer Research Center, posted a paper on the preprint server bioRxiv Tuesday detailing how he recovered data containing SARS-CoV-2 sequences taken from early coronavirus cases in China that had been stored in the National Library of Medicine’s Sequence Read Archive (SRA) and later deleted.
The paper has yet to be peer reviewed.
Bloom was able to retrieve the data through Google Cloud, which is used by scientists all around the world to post deep sequencing data for others to analyze.
The researcher says the data bolsters evidence that the coronavirus was circulating in Wuhan, China, before a December outbreak linked to a wet market in the city. The paper, however, does not draw any conclusions about whether the virus spread naturally from animals to humans or if it was the result of a laboratory leak.
He said his analysis shows the samples currently used to investigate the origins of the virus may not be complete. He said the “fact this dataset was deleted should make us skeptical that all other relevant early Wuhan sequences have been shared.”
Some of the data was included in a preprint paper that Chinese scientists posted in March 2020 and later published in the journal Small in June. The NIH confirmed the data had been deleted from the SRA at the request of the researcher who originally published the data, saying it’s common practice.
“These SARS-CoV-2 sequences were submitted for posting in SRA in March 2020 and subsequently requested to be withdrawn by the submitting investigator in June 2020. The requestor indicated the sequence information had been updated, was being submitted to another database, and wanted the data removed from SRA to avoid version control issues,” NIH said in a statement.
The agency said it could not “speculate on motive beyond a submitter’s stated intentions.”
Bloom has pushed for a more transparent and independent investigation into the origin of COVID-19. He was one of several scientists who signed onto a letter published in the journal Science calling for further investigation into the lab leak theory.
*The Hill.com