Updated Details of volunteers of UK-based Biobank, which describes itself as the custodian of the world's most comprehensive biomedical dataset, are for sale on Chinese ecommerce site Alibaba.
The organization confirmed the data on roughly half a million volunteers was anonymized, but could not guarantee it would be impossible to identify individuals if it fell into the wrong hands.
The revelation came from UK technology minister Ian Murray speaking in the House of Commons on Thursday, with his comments delivered at the same time as Biobank confirming the data mishap.
Updated to add at 1525 April 23:
Three Chinese research institutions have been banned from UK Biobank's platform after the data belonging to half a million volunteers was listed for sale on Chinese e-commerce site Alibaba.
UK Biobank is a charity that runs the eponymous research project. It describes itself as the custodian of the world's most comprehensive biomedical dataset that's used by medical researchers globally.
The charity confirmed to the UK government on April 20 that three separate listings of data, one of which contained data belonging to all 500,000 UK participants, were listed for sale online by an unknown source. The revelation came from UK technology minister Ian Murray addressing the House of Commons on Thursday, with his comments delivered at the same time as UK Biobank confirming the data mishap via its website.
Both Murray and UK Biobank said the data was anonymized, but could not be wholly certain that it couldn't be used to identify individuals if it ended up in the wrong hands. Investigations into the abuse of data are ongoing, but there is currently no evidence to suggest that the data was bought or downloaded. Murray said that the Chinese government was heavily influential in supporting the takedown of the listings, as was Alibaba.
"I want to thank the Chinese government for the speed and seriousness with which they worked with us to help remove these listings and the ongoing work to remove any further listings," said Murray.
The tech minister added that although the three institutions from which the data was derived were Chinese, this fact alone makes no suggestions about the intent behind the data's listing.
UK Biobank revoked the accreditation of the three research institutions, meaning they can no longer access the charity's platform or its data, but other institutions, such as Yale University, have also previously had their access revoked for "a breach of data," Murray confirmed.
A root cause analysis remains ongoing, although the current thinking is that the three Chinese institutions downloaded the bulk UK Biobank dataset to local storage, and through means yet to be identified, the data was listed for sale on Alibaba.
In 2024, UK Biobank changed the way accredited institutions access volunteers' data. It previously handed bulk datasets to said institutions for research purposes, but changed access models to one where only UK Biobank stored the data, and accredited researchers were then given logins to access the UK Biobank platform. Researchers carried out their required data analysis on the UK Biobank platform and downloaded the results of that analysis, not the data that informed it.
"What the system also allowed you to do, although you were contractually as an accredited organization not supposed to do, is download the datasets," Murray told the Commons.
"We understand from UK Biobank that this is probably what happened here - those three institutions have downloaded the datasets themselves, and we are yet unclear about how those data sets have ended up on that website, but the UK Biobank and institutions and organizations attached to government are working through that at the moment."
UK Biobank's response
UK Biobank said that the data listed for sale contained no personally identifiable information, such as names of the volunteers, their addresses, phone numbers, or NHS numbers, and expressed its gratitude to the authorities that helped remove the listings.
The charity did not specify the types of data that were included, but Murray stated in the Commons that several markers were included in the listings:
- Gender
- Age
- Month and year of birth
- Assessment center data
- Attendance dates
- Socioeconomic status
- Lifestyle habits
- Measures from biological samples related to haematology, biology, and chemistry
- Sleep, diet, work environment, mental health, and health outcomes data.
UK Biobank told the government that it could not be 100 percent sure that the data could be used to identify a volunteer, but it would require highly advanced interpretations of the data to do so.
In a statement issued on Thursday, UK Biobank said it had introduced a number of security improvements in the wake of the findings. "We have temporarily suspended all access to the UK Biobank research platform, while we put in place a strict limit on the size of files that can be taken off the platform," said Professor Sir Rory Collins, CEO and principal investigator of UK Biobank.
"This measure will allow researchers to export the results of their research, while severely limiting their ability to take any de-identified participant data off the platform. In addition, all files exported from the research platform will be monitored daily for any suspicious behavior. These security measures will further minimize the potential for misuse of UK Biobank data. In addition, we will conduct a comprehensive and forensic board-led investigation of this incident.
"We are developing the world's first automated checking system able to prevent de-identified participant data from being taken off the UK Biobank research platform, without preventing the important research that is being done by thousands of scientists around the world. We intend to have this automated system in place around the end of this year."
UK Biobank launched its project in 2012, and the anonymized data it provides experts (institutions in Russia, Iran, and North Korea are banned) informs leading medical research into conditions such as dementia, cancer, Parkinson's disease, chronic pain, COVID-19 immunity, and more.
Despite the "unacceptable abuse" of medical data in this case, the UK expects UK Biobank to be the world's leading provider of biomedical data for research institutions going forward.
The charity reported the incident to the UK government on April 20 and reported itself to the Information Commissioner's Office shortly after.
"People's medical data is highly sensitive information, not only do people expect it to be handled carefully and securely, organizations also have a responsibility under the law," an ICO spokesperson told The Register.
"UK Biobank has made us aware of an incident and we are making enquiries."The Register contacted Alibaba for more information. ®
