Can Multimodal AI Replace Radiologists?

15 Apr 2025

Authors:

(1) Jinge Wang, Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA;

(2) Zien Cheng, Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA;

(3) Qiuming Yao, School of Computing, University of Nebraska-Lincoln, Lincoln, NE 68588, USA;

(4) Li Liu, College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA and Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA;

(5) Dong Xu, Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA;

(6) Gangqing Hu, Department of Microbiology, Immunology & Cell Biology, West Virginia University, Morgantown, WV 26506, USA ([email protected]).

Table of Links

Abstract and 1. Introduction

2. Omics

3. Genetics

4. Biomedical Text Mining and 4.1. Performance Assessments across typical tasks

4.2. Biological pathway mining

5. Drug Discovery

5.1. Human-in-the-Loop and 5.2. In-context Learning

5.2 Instruction Finetuning

6. Biomedical Image Understanding

7. Bioinformatics Programming

7.1 Application in Applied Bioinformatics

7.2. Biomedical Database Access

7.2. Online tools for Coding with ChatGPT

7.4 Benchmarks for Bioinformatics Coding

8. Chatbots in Bioinformatics Education

9. Discussion and Future Perspectives

Author Contributions, Acknowledgements, Conflict of Interest Statement, Ethics Statement, and References

6. BIOMEDICAL IMAGE UNDERSTANDING

In recent advancements, multimodal AI models have garnered significant attention in biomedical research[76]. Released in late September 2023, GPT-4V(ision) has been the subject of numerous studies that explored its application in image-related tasks across various biomedical topics[77-83]. For biomedical images, GPT-4V exhibits a performance rivaling professionals in Medical Visual Question Answering[81, 82] and exceeds traditional image models in biomedical image classification[84]. For scientific figures, GPT-4V can proficiently explain various plot types and apply domain knowledge to enrich interpretations[85].

Despite the impressive performance, current evaluations reveal significant limitations. OpenAI acknowledges the limitation of GPT-4V in differentiating closely located text and making factual errors in an authoritative tone[86]. The model is not competent in perceiving visual patterns' colors, quantities, and spatial relationships in scientific figures[85]. Image interpretation with domain knowledge from GPT-4V may risk “confirmation bias"[87]: either the observation or conclusion is incorrect, but the supporting knowledge is valid[85], or the observation or conclusion is correct, but the supporting knowledge is invalid/irrelevant[88]. Such biases are particularly concerning as users without requisite expertise might be easily misled by these plausible responses.

Prompt engineering has been instrumental in enhancing AI responses to text inputs. The emergence of GPT4V emphasizes the need to develop equivalent methodologies for visual inputs to refine chatbots' comprehension across modalities. The field of computer vision has already witnessed some progress in this direction[89]. Yang, Li [90] proposes visual referring prompting (VRP) by setting visual pointer references through directly editing input images to augment textual prompts with visual cues. VRP has proven effective in preliminary case studies, leading to the creation of a benchmark like VRPTEST[91] to evaluate its efficacy. Yet, a thorough, quantitative assessment of VRP's impact on GPT-4V's understanding of biomedical images remains to be explored.

This paper is available on arxiv under CC BY 4.0 DEED license.

← Previous

Can ChatGPT Accelerate Drug Discovery? Here's What the Science Says

Up Next →

Applied Bioinformatics Gets a Boost from AI-Powered Tools