Table of Links
2. Community Challenges Overview and 2.1 CCKS
2.2 CHIP and 2.3 CCIR, CSMI, CCL and DCIC
3. Evaluation Tasks Overview and 3.1 Information Extraction
3.2 Text Classification and Text Similarity
3.3 Knowledge Graph and Question Answering
3.4 Text Generation and Knowledge Reasoning and 3.5 Large Language Model Evaluation
4. Translational Informatics in Biomedical Text Mining
5. Discussion and Perspective
5.1. Contributions of Community Challenges
5.2. Limitations of Current Community Challenges
5.3. Future Perspectives in the Era of Large Language Models, and References
2. Community Challenges Overview
Figure 1 presents the timeline overview of these challenges spanning the years 2017 to 2023. Each community challenge is represented by a different background color. The challenge names are presented in white color, while the specific shared tasks within each challenge are shown in black. The challenge tasks were initially introduced to the Chinese biomedical text mining community by China Conference on Knowledge Graph and Semantic Computing (CCKS) in 2017, and was subsequently gained prominence through the China Health Information Processing Conference (CHIP). Others such as Chinese Conference on Information Retrieval (CCIR), Chinese Society of Medical Information (CSMI), China National Conference on Computational Linguistics (CCL), and Digital China Innovation Contest (DCIC) also contributed in recent years.
2.1 CCKS
As shown in Table 1, in 2017, CCKS held first Chinese Biomedical Text Mining Evaluation, focusing on clinical named entity recognition in electronic medical records [53]. As a basic Natural Language Processing (NLP) task, CCKS has held clinical named entity recognition task for five consecutive years from 2017 to 2021 [51-55]. In 2019, entity attribute recognition was additionally introduced [51], and in 2020, event extraction was introduced [52]. Furthermore, in 2020, knowledge graph construction and question-answering tasks specifically related to COVID-19 were organized [56]. In 2021, CCKS expanded its scope to include various task types such as medical entity recognition and event extraction, link prediction in a multi-level knowledge graph involving phenotypes, drugs, and molecules, generation of medical dialogues containing implicit entities, and reading comprehension of medical popular science knowledge [55, 57, 58]. In 2023, based on the Chinese Biomedical Language Understanding Evaluation (CBLUE) dataset, CCKS transformed 16 different NLP tasks in medical scenarios into prompt-based language generation tasks, establishing the first Chinese benchmark for evaluating language models in the medical domain [59, 60]. It is worth noting that the evaluation tasks organized in the CCKS community challenge encompass a broad range of domains, with biomedical and healthcare being just one of them.
This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.
Advancing Biomedical Text Mining with Community Challenges
Authors:
(1) Hui Zong, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and the author contributed equally;
(2) Rongrong Wu, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and the author contributed equally;
(3) Jiaxue Cha, Shanghai Key Laboratory of Signaling and Disease Research, Laboratory of Receptor-Based Bio-Medicine, Collaborative Innovation Center for Brain Science, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China;
(4) Erman Wu, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China;
(5) Jiakun Li, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and Department of Urology, West China Hospital, Sichuan University, Chengdu, 610041, China;
(6) Liang Tao, Faculty of Business Information, Shanghai Business School, Shanghai, 201400, China;
(7) Zuofeng Li, Takeda Co. Ltd., Shanghai, 200040, China;
(8) Buzhou Tang, Department of Computer Science, Harbin Institute of Technology, Shenzhen, 518055, China;
(9) Bairong Shen, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and a Corresponding author.