A batch of early coronavirus data disappeared He came out of hiding for a year.
In June, an American scientist discovered that a staggering 200 genetic sequences from specimens of Covid-19 patients isolated in China early in the pandemic had been removed from an online database. With a little digital detective, Jesse Bloom, a virologist at the Fred Hutchinson Cancer Center in Seattle, managed to track down 13 series on Google Cloud.
Dr. When Bloom shares her experiences statement when it was posted online, it “seems likely that the series were deleted to hide their existence”.
But now, a strange revelation has emerged, stemming from the editorial oversight of a scientific journal. And the sequences were uploaded to a different database overseen by the Chinese government.
The story began in early 2020, when researchers at the University of Wuhan researched a new way to test for the deadly coronavirus ravaging the country. They extracted a short string of genetic material from virus samples from 34 patients at the Wuhan hospital.
Researchers sent They published their findings online in March 2020. That month, they uploaded the sequences to an online database called the Array Reading Archive, also maintained by the National Institutes of Health, and submitted a paper describing their results to a scientific journal. Small. paper published in June 2020.
Dr. Bloom became aware of the Wuhan sequences while investigating the origin of Covid-19 this spring. Reading a May 2020 review He came across a spreadsheet about the early genetic sequences of coronaviruses noting their presence in the Sequence Reading Archive.
But Dr. Bloom couldn’t find them in the database. On June 6, he emailed Chinese scientists to ask where the data was going, but got no response. On June 22, he published his report, which was covered by The New York Times and other media outlets.
At the time, the NIH spokesperson said the authors of the study requested that the series be withdrawn from the database in June 2020. The authors informed the agency that the sequences have been updated and will be added to a different database. (The authors did not respond to questions from The Times.)
But a year later, Dr. Bloom did not find the sequences in any database.
On July 5, more than a year after researchers pulled the sequences from the Sequence Reading Archive, Dr. Two weeks after Bloom’s report went online, the series was quietly aired. uploaded To a database maintained by the China National Center for Bioinformation by Ben Hu, researcher at Wuhan University and co-author of the Small paper.
On July 21, the disappearance of the series was brought up at a press conference in Beijing, where Chinese officials denied allegations that the pandemic started as a lab leak.
by translation Deputy Minister of the National Health Commission of China Dr. Zeng Yixin said at a press conference with a journalist at the state-controlled Xinhua News Agency that the problem arose when editors at Small deleted a paragraph in which the scientists explained the sequences. Sequence Reading Archive.
Dr. “Therefore, the researchers felt it was no longer necessary to store the data in the NCBI database,” said Zeng, referring to the NIH-run Array Reading Archive.
An editor at Small in Germany, who specializes in micro- and nano-scale science, has verified his account. “The data availability statement was accidentally deleted,” editor Plamena Dogandzhiyski said in an email. “We will very shortly release a fix that will clarify the bug and include a link to the repository where the data is currently hosted.”
The newspaper made an official statement. correction To this end, on Thursday.
It is unclear why the authors did not mention the journal’s error when requesting that the series be removed from the Series Reading Archive, or why they told the NIH that the series was being updated. It’s also not clear why they waited a year to upload them to another database. Dr. Hu did not respond to an email requesting comment.
Dr. Bloom also could not make a statement for the conflicting accounts. “I am not in a position to judge between them,” he said in an interview.
By themselves, these sequences cannot be resolved open questions It’s about how the pandemic came about, whether through contact with a wild animal, through a leak from a lab, or otherwise.
In their initial report, the Wuhan researchers wrote that they had extracted genetic material “from samples taken from outpatients with suspected COVID-19 early in the outbreak.” But now the entries in the Chinese database to indicate He said they were retrieved from the Renmin Hospital of Wuhan University on January 30, almost two months after the earliest reports of Covid-19 in China.
While the disappearance of the series may seem like the result of an editorial error, Dr. Bloom felt it was still worthwhile to look for other coronavirus sequences that might be lurking online. “This definitely means we have to keep looking,” he said.