Biomedical Text Classification & Similarity: CHIP/CCKS Tasks

cover
24 Apr 2025

Abstract and 1. Introduction

2. Community Challenges Overview and 2.1 CCKS

2.2 CHIP and 2.3 CCIR, CSMI, CCL and DCIC

3. Evaluation Tasks Overview and 3.1 Information Extraction

3.2 Text Classification and Text Similarity

3.3 Knowledge Graph and Question Answering

3.4 Text Generation and Knowledge Reasoning and 3.5 Large Language Model Evaluation

4. Translational Informatics in Biomedical Text Mining

5. Discussion and Perspective

5.1. Contributions of Community Challenges

5.2. Limitations of Current Community Challenges

5.3. Future Perspectives in the Era of Large Language Models, and References

Figure Legends and Tables

3.2 Text Classification and Text Similarity

Text classification refers to the task of categorizing texts in the field of biomedical sciences based on their themes, types, or other characteristics. In 2019, CHIP introduced a clinical trial eligibility criteria text classification task, which aimed to classify criteria sentences into 44 predefined semantic categories [25, 61]. In 2021, CHIP released a task which aims to classify clinical descriptions collected from internet into negative and positive categories, based on their relevance to patient conditions. In 2023, CHIP released an internet diabetes consultation question classification task, which defined six classes, including diagnosis, treatment, common knowledge, healthy lifestyle, epidemiology, and other. In 2021, CSMI organized a task focused public health questions, which defined six classes, including diagnosis, treatment, anatomy/physiology, epidemiology, healthy lifestyle, and physician selection. It is worth noting that all of these tasks are single-label classification tasks.

Text similarity refers to the measurement of the semantic or content-related similarity between two texts. For instance, in the tasks released by CHIP in 2018 and 2019, patient health consultation question corpora were collected from the internet, and given two questions, the objective was to determine if their intents were the same. On the other hand, in the task released by CCKS in 2021, focusing on medical popular science knowledge, questions and answers were provided, and the objective was to judge whether they matched or not [57].

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Hui Zong, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and the author contributed equally;

(2) Rongrong Wu, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and the author contributed equally;

(3) Jiaxue Cha, Shanghai Key Laboratory of Signaling and Disease Research, Laboratory of Receptor-Based Bio-Medicine, Collaborative Innovation Center for Brain Science, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China;

(4) Erman Wu, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China;

(5) Jiakun Li, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and Department of Urology, West China Hospital, Sichuan University, Chengdu, 610041, China;

(6) Liang Tao, Faculty of Business Information, Shanghai Business School, Shanghai, 201400, China;

(7) Zuofeng Li, Takeda Co. Ltd., Shanghai, 200040, China;

(8) Buzhou Tang, Department of Computer Science, Harbin Institute of Technology, Shenzhen, 518055, China;

(9) Bairong Shen, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and a Corresponding author.