2025 SREDH/AI CUP

Recent advancements in generative artificial intelligence (AI) and natural language processing (NLP) have positioned large language models (LLMs) as transformative tools across various industries, particularly healthcare. These technologies enhance patient care, streamline administrative tasks, and advance medical research by analyzing extensive clinical data from electronic health records (EHRs), medical imaging, and genomics. The application of LLMs in clinical medicine raises privacy concerns, particularly about the inadvertent leakage of confidential information. EHRs have sensitive patient data, making proper de-identification crucial for advance research. Integrating LLMs into healthcare presents challenges in terms of data privacy and safeguarding sensitive health information (SHI) within medical documents. Sophisticated algorithms are needed to identify and remove SHI from unstructured clinical texts while considering medical context, terminology, and evolving data privacy regulations. To address these challenges, the 2023 SREDH/AI CUP competition was conducted, which featured two subtasks: SHI Recognition, which focused on recognizing sensitive health information (SHI) within clinical texts, and Temporal Information Normalization, which aimed to standardize temporal information, ensuring consistency across medical records. An international workshop was also conducted as a closing event in 2024 with proceedings published.

Building up on the success of the SREDH/AI CUP 2023 competition, the 2025 competition aims to enhance the protection of sensitive information when introducing AI technology applications in healthcare settings by organizing the following two sub-tasks.

Sub-task 1: Doctor-Patient Speech Recognition

Develop Automatic Speech Recognition (ASR) technology to convert spoken dialogue into text records.

Sub-task 2: Sensitive Health Information Recognition

Develop a system capable of identifying sensitive personal information mentioned in speech recordings and correctly classifying it according to the SHI types defined in the task.

Key dates

Registration Period: 10 March 2025 ~ 30 May 2025

Registration opens, competition officially begins.

First Part of Training Dataset Download: 31 March 2025 ~ 06 June 2025

Phase 1: Download Training Dataset 01. Participants can upload verification set predictions up to three times daily. Only audio files are provided for the validation set.

Second Part of Training Dataset Download: 28 April 2025 ~ 06 June 2025

Phase 2: Download Training Dataset 02. Participants can upload verification set predictions up to three times daily. Only audio files are provided for the validation set.

Validation Dataset Download: 12 May 2025 ~ 06 June 2025

Phase 3: Download Validation Dataset Answer. Participants can upload verification set predictions up to three times daily.

Competition Test Dataset Download and Prediction Upload: 07 June 2025 12:00 PM ~ 08 June 2025 12:00 PM

Phase 4: Upload Private Dataset answers starting 07 June at 12:00 PM. Deadline is 08 June 12:00 PM. Daily upload limit is three times. Participants must manually add their best score to the Leaderboard.

Results Announcement: 09 June 2025 12:00 PM