Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English.

Chinese generative AI models (DeepSeek and Qwen) rival ChatGPT-4 in ophthalmology queries with excellent performance in Arabic and English.

Sallam, Malik; Alasfoor, Israa M; Khalid, Shahad W; Al-Mulla, Rand I; Al-Farajat, Amwaj; Mijwil, Maad M; Zahrawi, Reem; Sallam, Mohammed; Egger, Jan; Al-Adwan, Ahmad S
Narra J 2025 Vol. 5 pp. e2371
41
sallam2025chinese

Abstract

The rapid evolution of generative artificial intelligence (genAI) has ushered in a new era of digital medical consultations, with patients turning to AI-driven tools for guidance. The emergence of Chinese-developed genAI models such as DeepSeek-R1 and Qwen-2.5 presented a challenge to the dominance of OpenAI's ChatGPT. The aim of this study was to benchmark the performance of Chinese genAI models against ChatGPT-40 and to assess disparities in performance across English and Arabic. Following the METRICS checklist for genAI evaluation, Qwen-2.5, DeepSeek-R1, and ChatGPT-40 were assessed for completeness, accuracy, and relevance using the CLEAR tool in common patient ophthalmology queries. In English, Qwen-2.5 demonstrated the highest overall performance (CLEAR score: 4.43 ± 0.28), outperforming both DeepSeek-R1 (4.3 ± 0.43) and ChatGPT-40 (4.14 ± 0.41), with  = 0.002. A similar hierarchy emerged in Arabic, with Qwen-2.5 again leading (4.40 ± 0.29), followed by DeepSeek-R1 (4.20 ± 0.49) and ChatGPT-40 (4.14 ± 0.41), with  = 0.007. Each tested genAI model exhibited near-identical performance across the two languages, with ChatGPT-40 demonstrating the most balanced linguistic capabilities ( = 0.957), while Qwen-2.5 and DeepSeek-R1 showed a marginal superiority for English. An in-depth examination of genAI performance across key CLEAR components revealed that Qwen-2.5 consistently excelled in content completeness, factual accuracy, and relevance in both English and Arabic, setting a new benchmark for genAI in medical inquiries. Despite minor linguistic disparities, all three models exhibited robust multilingual capabilities, challenging the long-held assumption that genAI is inherently biased toward English. These findings highlight the evolving nature of AI-driven medical assistance, with Chinese genAI models being able to rival or even surpass ChatGPT-40 in ophthalmology-related queries.

Citation

ID: 283213
Ref Key: sallam2025chinese
Use this key to autocite in SciMatic or Thesis Manager

References

Blockchain Verification

Account:
NFT Contract Address:
0x95644003c57E6F55A65596E3D9Eac6813e3566dA
Article ID:
283213
Unique Identifier:
10.52225/narra.v5i1.2371
Network:
Scimatic Chain (ID: 481)
Loading...
Blockchain Readiness Checklist
Authors
Abstract
Journal Name
Year
Title
5/5
Creates 1,000,000 NFT tokens for this article
Token Features:
  • ERC-1155 Standard NFT
  • 1 Million Supply per Article
  • Transferable via MetaMask
  • Permanent Blockchain Record
Blockchain QR Code
Scan with Saymatik Web3.0 Wallet

Saymatik Web3.0 Wallet