::: 回首頁CollaborationScientific Research Results

Construct a Trustworthy Research Environment For Biomedical Data : the Health Big Data Sustainability Platform

2024.09.20

CONTRIBUTING TEAM:Yu-Tai Wang, Chang-Wei Yeh, Yen-Jen Lin


The medical industry has entered an era of digitization, and making the best use of the power of data to improve the quality of academic research and medical services has become an important strategy for the industry-academia-research community. Data science is essential to the development of the medical industry, but information that is personal, such as genetic data or personal disease history, must also be strictly protected. Therefore, how to allow researchers to remote access the data easily while without to take away the sensitive data, how to properly manage and protect data subject’s rights after data is obtained, and how to ensure that the data is being used legally and in compliance with regulations have become the focus of the recent development of the medical industry. 

Presently, various international practices exist for addressing these issues, with one typical example being The Cancer Genome Atlas (TCGA) in the United States. TCGA is a significant cancer genomics program globally, which has molecularly characterized over 20,000 primary cancer samples and corresponding normal samples from the same patients, covering 33 cancer types. The dataset includes tumor information and genome sequencing data, with some data being publicly available and the rest accessible upon approval of applications for use. The user base comprises domestic scientists, researchers, academic institutions, and industrial entities, with worldwide access available. It is hoped that the comprehensive cancer genome database can serve as shared knowledge to advance cancer research globally.

However, the TCGA data is mainly composed of whites and African Americans, and there are genetic differences between Taiwanese. To advance precision medicine in Taiwan, establishing a domestic population disease cohort database is crucial. The National Science and Technology Council (NSTC) aims to establish a Taiwan-specific disease genome database as part of the Sustainable Health Big Data Platform Program. In the future, this database is expected to be integrated with the health insurance database, providing comprehensive pre- and post-disease data for academia and industry. By leveraging artificial intelligence technology, this integration not only enhances disease prediction tools but also enables more accurate and effective medical care.

健康大數據永續平台流程圖
The workflow of Cloud Data Analysis and Sharing Platform

NCHC is responsible for establishing a trusted information infrastructure platform. Authorized researchers can conduct data analysis within Taiwan's strict regulatory framework and use the data while ensuring that the original data is not taken out of the analysis environment. This approach balances the protection of personal data with the dual considerations of upgrading the medical industry. At present, we have achieved preliminary results in building a shared national biomedical data information system.

In the Sustainable Health Big Data Platform project, NCHC is authorized to act as the data processor and has partnered with eight medical centers to collect data on approximately 4,000 patients, including tumor genome sequences, medical images, and electronic health records. We also focus on standardization and quality assurance issues. Therefore, a comprehensive standard format and an upload system were developed to facilitate hospitals in easily uploading data, confirming quality, and obtaining real-time statistical information.

Establishing a trusted environment and uploading system has been an avant-garde endeavor. There are very few similar efforts anywhere in the world, and it is certainly the very first time in Taiwan. With the joint efforts of NCHC and medical centers, we have overcome a variety of challenges. For instance, participating medical centers have varying levels of computerization, so it has been necessary to make adjustments, review frequently, and communicate with each other. But finally, we have successfully hammered out a jointly workable mechanism for this purpose.

In addition to active communication, we are also committed to improving the efficiency of system operations. We list an example as medical image data processing. NCHC team and software developers in Taiwan jointly designed an automation mechanism to allow automatic checking of data integrity and upload it. This has solved the problem of manual data checking initially, which was very inefficient and delivered poor results. The system was built from scratch and was built to allow not only the partners to accumulate experience, but also to help train professionals in the field of biomedical information in Taiwan.

To achieve the goal of precision medicine, we hope to establish a comprehensive medical dataset would be a proper way. The dataset content should include at least four categories: medical imaging, tumor genome sequences, digital pathologies, and digital medical records.


可信賴雲環境
NCHC construct Trusted Cloud Environment System Architecture (2024-2027)


Furthermore, NCHC has initiated the planning and development of an expandable, secure, high-performance cloud computing environment to replace isolation analysis rooms. This environment will offer remote access for analyzing medical data, effectively preventing unauthorized extraction of original data. In the future, we envision offering platform services such as health big data sustainability platforms, cancer moon-shot datasets, and secure analysis of private bioindustry data. Through these initiatives, we aim to lay a solid foundation for the development of domestic biomedicine in Taiwan.