Future Perspectives in the Era of Large Language Models, and References

With the emergence of large language models such as ChatGPT and its successors, there is an urgent need for comprehensive evaluation of their capability in biomedical text mining or healthcare information processing. These capabilities should include medical information retrieval, medical language understanding, medical text generation, medical knowledge question answering, clinical decision reasoning, and more. To address this, evaluation benchmarks like PromptCBLUE [50, 59] , CMB [72] and MedBench [73] have been proposed.

Handling multi-source and multimodal data

Biomedical text data encompasses a wide range of sources, including published literatures, electronic medical records, textbooks, and social media, etc. Additionally, Biomedical data exist in various modalities such as images, videos, and audio. Utilizing these diverse sources and multimodal data can provide more comprehensive information for diagnosis, treatment, and research [39]. Therefore, evaluating the model's ability to integrate multiple sources and process multimodal data is crucial in future community challenges. This may involve tasks such as cross-modal information fusion, cross-modal querying and retrieval.

Leveraging domain-specific knowledge

The field of biomedicine has a large number of software, tools, and knowledge bases. Integrating their functionalities into large language models can greatly enhance the models' capabilities. For example, the model can automatically invoke appropriate software and tools to parse, annotate, and analyze biomedical data [74]. Moreover, the model can extract relevant information from knowledge bases to generate more accurate responses and reduce erroneous information.

Data privacy and security

The collection and sharing of biomedical data raise concerns about data privacy and security [75]. It is necessary to consider how to protect sensitive information while performing text mining and information processing using large language models. Future community challenges should include evaluation tasks related to data privacy and security to ensure responsible development and deployment of these models.

Interpretability

Interpreting and understanding the predictions of models is crucial in clinical practice. Reliable explanations and evidence-based support are necessary for clinical decision-making. However, evaluating the interpretability of models represents a significant challenge. This may involve explaining the decision-making process of the model, providing reliable evidence and explanations for the predictions, and effectively interacting with clinical experts to ensure that model’s predictions are correctly understood or adjusted based on their knowledge and experience.

Integration with translational applications

Applying large language models to clinical practice is an important goal. Evaluating the model's ability to integrate with clinical translational applications is a key challenge. This may involve integrating the model into medical knowledge bases, hospital information systems, clinical decision support systems, or other clinical workflows. By doing so, the translation of research findings into practical healthcare solutions can be accelerated.

Acknowledgements

We thank the community challenges for providing an open platform, the organizers of evaluation tasks for defining tasks and providing datasets, and the participants of the tasks for developing algorithms or systems.

Author contributions

HZ and RW performed data collection, investigation, and analysis. HZ drafted the manuscript. RW revised the manuscript. JC, EW, JL and LT provided constructive suggestion and technique assistance; BS obtained the funding; ZL, BT and BS supervised the study. All authors have read and agreed to the published version of the manuscript.

Competing interests

The authors declare no competing interests.

Data availability

No datasets were created in this study. The datasets in evaluation tasks of community challenges can be found in related websites or papers.

Funding

This work was supported by the National Natural Science Foundation of China (32270690 and 32070671).

References

Lu, Z., PubMed and beyond: a survey of web tools for searching biomedical literature. Database (Oxford), 2011. 2011: p. baq036.
Chen, Q., et al., LitCovid in 2022: an information resource for the COVID-19 literature. Nucleic Acids Res, 2023. 51(D1): p. D1512-D1518.
Wang, Y., et al., A knowledge empowered explainable gene ontology fingerprint approach to improve gene functional explication and prediction. iScience, 2023. 26(4): p. 106356
Shen, L., et al., The fourth scientific discovery paradigm for precision medicine and healthcare: Challenges ahead. Precis Clin Med, 2021. 4(2): p. 80-84.
Bekhuis, T., Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy. Biomed Digit Libr, 2006. 3: p. 2.
Gopalakrishnan, V., et al., Towards self-learning based hypotheses generation in biomedical text domain. Bioinformatics, 2018. 34(12): p. 2103-2115.
He, J., et al., The practical implementation of artificial intelligence technologies in medicine. Nat Med, 2019. 25(1): p. 30-36.
Shortliffe, E.H. and M.J. Sepulveda, Clinical Decision Support in the Era of Artificial Intelligence. JAMA, 2018. 320(21): p. 2199-2200.
Zhu, F., et al., Biomedical text mining and its applications in cancer research. J Biomed Inform, 46(2): p. 200-11.
Przybyla, P., et al., Text mining resources for the life sciences. Database (Oxford), 2016. 2016.
Allot, A., et al., LitSense: making sense of biomedical literature at sentence level. Nucleic Acids Res, 2019. 47(W1): p. W594-W599.
Wei, C.H., et al., PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res, 2019. 47(W1): p. W587-W593.
Zhao, S., et al., Recent advances in biomedical literature mining. Brief Bioinform, 2021. 22(3).
Wei, C.H., H.Y. Kao, and Z. Lu, PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res, 2013. 41(Web Server issue): p. W518-22.
Dogan, R.I., R. Leaman, and Z. Lu, NCBI disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inform, 2014. 47: p. 1-10.
Wei, C.H., et al., tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics, 2013. 29(11): p. 1433-9.
Lei, J., et al., A comprehensive study of named entity recognition in Chinese clinical text. J Am Med Inform Assoc, 2014. 21(5): p. 808-14.
Yang, X., et al., Clinical concept extraction using transformers. J Am Med Inform Assoc, 2020. 27(12): p. 1935-1942.
Hu, Y., et al., Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach. Bioinformatics, 2023. 39(9).
Krallinger, M., et al., The CHEMDNER corpus of chemicals and drugs and its annotation principles. J Cheminform, 2015. 7(Suppl 1 Text mining for chemistry and the CHEMDNER track): p. S2.
Luo, L., et al., BioRED: a rich biomedical relation extraction dataset. Brief Bioinform, 2022.23(5).
Li, J., et al., BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database (Oxford), 2016. 2016.
Liu, S., et al., Drug-Drug Interaction Extraction via Convolutional Neural Networks. Comput Math Methods Med, 2016. 2016: p. 6918381.
Chen, J., et al., Biomedical relation extraction via knowledge-enhanced reading comprehension. BMC Bioinformatics, 2022. 23(1): p. 20.
Zong, H., et al., Semantic categorization of Chinese eligibility criteria in clinical trials using machine learning methods. BMC Med Inform Decis Mak, 2021. 21(1): p. 128.
Chen, Q., et al., Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. Database (Oxford), 2022. 2022.
Fiorini, N., et al., Best Match: New relevance search for PubMed. PLoS Biol, 2018. 16(8): p.e2005343.
Chen, Y., et al., Prostate cancer management with lifestyle intervention: From knowledge graph to Chatbot. Clinical and Translational Discovery, 2022. 2(1): p. e29.
Chakraborty, C., M. Bhattacharya, and S.S. Lee, Artificial intelligence enabled ChatGPT and large language models in drug target discovery, drug discovery, and development. Mol Ther Nucleic Acids, 2023. 33: p. 866-868.
Malgaroli, M., et al., Natural language processing for mental health interventions: a systematic review and research framework. Transl Psychiatry, 2023. 13(1): p. 309.
Liu, S., et al., SHAPE: A Sample-Adaptive Hierarchical Prediction Network for Medication Recommendation. IEEE J Biomed Health Inform, 2023. 27(12): p. 6018-6028.
Liu, M., et al., Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs. J Am Med Inform Assoc, 2012. 19(e1): p. e28-35.
Xiong, Y., et al., A Unified Machine Reading Comprehension Framework for Cohort Selection. IEEE J Biomed Health Inform, 2022. 26(1): p. 379-387.
Stubbs, A., et al., Cohort selection for clinical trials: n2c2 2018 shared task track 1. J Am Med Inform Assoc, 2019. 26(11): p. 1163-1171.
Xiong, Y., et al., Cohort selection for clinical trials using hierarchical neural network. J Am Med Inform Assoc, 2019. 26(11): p. 1203-1208.
Singhal, A., M. Simmons, and Z. Lu, Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine. PLoS Comput Biol, 2016. 12(11): p. e1005017.
Tong, Y., et al., ViMRT: a text-mining tool and search engine for automated virus mutation recognition. Bioinformatics, 2023. 39(1).
Li, P.H., et al., pubmedKB: an interactive web server for exploring biomedical entity relations in the biomedical literature. Nucleic Acids Res, 2022. 50(W1): p. W616-W622.
Kline, A., et al., Multimodal machine learning in precision health: A scoping review. NPJ Digit Med, 2022. 5(1): p. 171.
Zong, H., et al., Performance of ChatGPT on Chinese national medical licensing examinations: a five-year examination evaluation study for physicians, pharmacists and nurses. BMC Med Educ, 2024. 24(1): p. 143.
Wornow, M., et al., The shaky foundations of large language models and foundation models for electronic health records. NPJ Digit Med, 2023. 6(1): p. 135.
Thirunavukarasu, A.J., et al., Large language models in medicine. Nat Med, 2023. 29(8): p. 1930-1940.
Huang, C.C. and Z. Lu, Community challenges in biomedical text mining over 10 years: success, failure and the future. Brief Bioinform, 2016. 17(1): p. 132-44.
Roberts, K., et al., Searching for scientific evidence in a pandemic: An overview of TRECCOVID. J Biomed Inform, 2021. 121: p. 103865.
Mahajan, D., et al., Overview of the 2022 n2c2 shared task on contextualized medication event extraction in clinical notes. J Biomed Inform, 2023. 144: p. 104432.
Li, Z., et al. CHIP2022 Shared Task Overview: Medical Causal Entity Relationship Extraction. in Health Information Processing. Evaluation Track Papers. 2023. Singapore: Springer Nature Singapore.
Luo, G., et al. Overview of CHIP 2022 Shared Task 5: Clinical Diagnostic Coding. in Health Information Processing. Evaluation Track Papers. 2023. Singapore: Springer Nature Singapore.
Ouyang, S., et al. Text Mining Task for “Gene-Disease” Association Semantics in CHIP 2022. in Health Information Processing. Evaluation Track Papers. 2023. Singapore: Springer Nature Singapore.
Zhu, W., et al. Extracting Decision Trees from Medical Texts: An Overview of the Text2DT Track in CHIP2022. in Health Information Processing. Evaluation Track Papers. 2023. Singapore: Springer Nature Singapore.
Zhu, W., et al. Overview of the PromptCBLUE Shared Task in CHIP2023. 2023. arXiv:2312.17522 DOI: 10.48550/arXiv.2312.17522.
Han, X., et al., Overview of the CCKS 2019 knowledge graph evaluation track: entity, relation, event and QA. arXiv preprint arXiv:2003.03875, 2020.
Li, X., et al., Overview of CCKS 2020 Task 3: named entity recognition and event extraction in Chinese electronic medical records. Data Intelligence, 2021. 3(3): p. 376-388.
Xia, Y. and Q. Wang. Clinical named entity recognition: ECUST in the CCKS-2017 shared task 2. in CEUR workshop proceedings. 2017.
Zhang, J., et al. Overview of CCKS 2018 Task 1: named entity recognition in Chinese electronic medical records. in Knowledge Graph and Semantic Computing: Knowledge Computing and Language Understanding: 4th China Conference, CCKS 2019, Hangzhou, China, August 24– 27, 2019, Revised Selected Papers 4. 2019. Springer.
Ma, C. and W. Huang. Named Entity Recognition and Event Extraction in Chinese Electronic Medical Records. 2022. Singapore: Springer Singapore.
Jia, T., et al., Link prediction based on tensor decomposition for the knowledge graph of COVID-19 antiviral drug. Data Intelligence, 2022. 4(1): p. 134-148.
Qin, B., et al., Ccks 2021-evaluation track. 2022: Springer.
Wang, Y., et al. End-to-End Pre-trained Dialogue System for Automatic Diagnosis. 2022. Singapore: Springer Singapore.
Zhu, W., et al. PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain. 2023. arXiv:2310.14151 DOI:10.48550/arXiv.2310.14151.
Ling, H., et al. Advanced PromptCBLUE Performance: A Novel Approach Leveraging Large Language Models. in Knowledge Graph and Semantic Computing: Knowledge Graph Empowers Artificial General Intelligence. 2023. Singapore: Springer Nature Singapore.
Zong, H., et al., [Artificial intelligence based Chinese clinical trials eligibility criteria classification]. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi, 2021. 38(1): p. 105-110.
Hongying, Z., et al. Building a pediatric medical corpus: Word segmentation and named entity annotation. in Chinese Lexical Semantics: 21st Workshop, CLSW 2020, Hong Kong, China, May 28–30, 2020, Revised Selected Papers 21. 2021. Springer.
Guan, T., et al. CMeIE: Construction and evaluation of Chinese medical information extraction dataset. in Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9. 2020. Springer.
Zhang, N., et al. CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark. 2021. arXiv:2106.08087 DOI: 10.48550/arXiv.2106.08087.
Liu, L., et al. Information Extraction of Medical Materials: An Overview of the Track of Medical Materials MedOCR. 2023. Singapore: Springer Nature Singapore.
Ma, M.W., et al., Extracting laboratory test information from paper-based reports. BMC Med Inform Decis Mak, 2023. 23(1): p. 251.
Liu, W., et al., MedDG: An Entity-Centric Medical Consultation Dataset for EntityAware Medical Dialogue Generation, in Natural Language Processing and Chinese Computing: 11th CCF International Conference, NLPCC 2022, Guilin, China, September 24–25, 2022, Proceedings, Part I. 2022, Springer-Verlag: Guilin, China. p. 447–459.
Devlin, J., et al., Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
Hu, D., et al., Zero-shot information extraction from radiological reports using ChatGPT. Int J Med Inform, 2024. 183: p. 105321.
Pinero, J., et al., The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Res, 2020. 48(D1): p. D845-D855.
Marshall, I.J., et al., Trialstreamer: A living, automatically updated database of clinical trial reports. J Am Med Inform Assoc, 2020. 27(12): p. 1903-1912.
Wang, X., et al., Cmb: A comprehensive medical benchmark in chinese. arXiv preprint arXiv:2308.08833, 2023.
Cai, Y., et al., Medbench: A large-scale chinese benchmark for evaluating medical large language models. arXiv preprint arXiv:2312.12806, 2023.
Schick, T., et al., Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 2024. 36.
Price, W.N., 2nd and I.G. Cohen, Privacy in the age of medical big data. Nat Med, 2019. 25(1): p. 37-43.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Hui Zong, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and the author contributed equally;

(2) Rongrong Wu, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and the author contributed equally;

(3) Jiaxue Cha, Shanghai Key Laboratory of Signaling and Disease Research, Laboratory of Receptor-Based Bio-Medicine, Collaborative Innovation Center for Brain Science, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China;

(4) Erman Wu, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China;

(5) Jiakun Li, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and Department of Urology, West China Hospital, Sichuan University, Chengdu, 610041, China;

(6) Liang Tao, Faculty of Business Information, Shanghai Business School, Shanghai, 201400, China;

(7) Zuofeng Li, Takeda Co. Ltd., Shanghai, 200040, China;

(8) Buzhou Tang, Department of Computer Science, Harbin Institute of Technology, Shenzhen, 518055, China;

(9) Bairong Shen, Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease related Molecular Network, West China Hospital, Sichuan University, Chengdu, 610041, China and a Corresponding author.

← Previous

Limitations of Current Biomedical Text Mining Community Challenges

Up Next →

Figure Legends and Tables for Our Biomedical Text Mining Research