Recent advancements in generative artificial intelligence (AI) and natural language processing (NLP) have positioned large language models (LLMs) as transformative tools across various industries, particularly healthcare. These technologies enhance patient care, streamline administrative tasks, and advance medical research by analyzing extensive clinical data from electronic health records (EHRs), medical imaging, and genomics. The application of LLMs in clinical medicine raises privacy concerns, particularly about the inadvertent leakage of confidential information. EHRs have sensitive patient data, making proper de-identification crucial for advance research. Integrating LLMs into healthcare presents challenges in terms of data privacy and safeguarding sensitive health information (SHI) within medical documents. Sophisticated algorithms are needed to identify and remove SHI from unstructured clinical texts while considering medical context, terminology, and evolving data privacy regulations. To address these challenges, the 2023 SREDH/AI CUP competition was conducted, which featured two subtasks: SHI Recognition, which focused on recognizing sensitive health information (SHI) within clinical texts, and Temporal Information Normalization, which aimed to standardize temporal information, ensuring consistency across medical records. An international workshop was also conducted as a closing event in 2024 with proceedings published.
Building up on the success of the SREDH/AI CUP 2023 competition, the 2025 competition aims to enhance the protection of sensitive information when introducing AI technology applications in healthcare settings by organizing the following two sub-tasks.
Develop Automatic Speech Recognition (ASR) technology to convert spoken dialogue into text records.
Develop a system capable of identifying sensitive personal information mentioned in speech recordings and correctly classifying it according to the SHI types defined in the task.
Registration Period: 10 March 2025 ~ 30 May 2025
Registration opens, competition officially begins.
First Part of Training Dataset Download: 31 March 2025 ~ 06 June 2025
Phase 1: Download Training Dataset 01. Participants can upload verification set predictions up to three times daily. Only audio files are provided for the validation set.
Second Part of Training Dataset Download: 28 April 2025 ~ 06 June 2025
Phase 2: Download Training Dataset 02. Participants can upload verification set predictions up to three times daily. Only audio files are provided for the validation set.
Validation Dataset Download: 12 May 2025 ~ 06 June 2025
Phase 3: Download Validation Dataset Answer. Participants can upload verification set predictions up to three times daily.
Competition Test Dataset Download and Prediction Upload: 07 June 2025 12:00 PM ~ 08 June 2025 12:00 PM
Phase 4: Upload Private Dataset answers starting 07 June at 12:00 PM. Deadline is 08 June 12:00 PM. Daily upload limit is three times. Participants must manually add their best score to the Leaderboard.
Results Announcement: 09 June 2025 12:00 PM
Private Leaderboard scores announced. Tie-breaking rules apply if scores are tied.
Report Upload: 09 June 2025 12:00 PM ~ 16 June 2025 12:00 PM
Top 30 teams on the Private Leaderboard must submit a report on their prediction model, including documentation and custom training datasets.
Final Ranking Announcement: 16 July 2025
Final competition rankings announced.
Workshop at MedInfo 2025: To be announced
Details of the workshop will be announced separately.
Award Ceremony: Early 2026
Details to be announced.
Registration
Please register at - https://www.codabench.org/competitions/4890/#/pages-tab