MRI
MRI India Journals Vol. 13 No. 2 (2026)

Hybrid CNN–Transformer Networks for Intelligent Medical Image Diagnosis and Classification

Authors

  • Wanchai Yamashiro Department of Computer Science and Engineering, Shiraz College of Systems and Management, Iran

Keywords:

Hybrid CNN–Transformer Medical Image Diagnosis Medical Image Classification Deep Learning, Vision Transformer Intelligent Healthcare Systems

Abstract

Deep learning approaches, particularly Convolutional Neural Networks (CNNs), have demonstrated remarkable success in extracting local spatial features from medical images such as X-rays, CT scans, MRI scans, histopathological slides, and ultrasound images. However, CNN-based architectures exhibit limitations in capturing long-range dependencies and global contextual relationships within medical images. Recently, Transformer-based architectures have emerged as powerful models capable of learning global contextual representations using self-attention mechanisms. Nevertheless, pure Transformer models often require large-scale datasets and high computational resources, which limit their effectiveness in medical imaging applications with limited annotated data. This research proposes a hybrid CNN–Transformer framework for intelligent medical image diagnosis and classification by integrating the local feature extraction capability of CNNs with the global contextual learning ability of Transformer networks. The proposed architecture utilizes convolutional layers for hierarchical spatial feature extraction followed by Transformer encoder modules for contextual representation learning and attention-based feature optimization. The framework incorporates adaptive attention mechanisms, feature fusion strategies, and intelligent classification layers to improve diagnostic accuracy and robustness across multiple medical imaging modalities. Experimental evaluation is performed using benchmark medical image datasets for disease classification, tumor detection, and abnormality identification tasks. The experimental results demonstrate that the proposed hybrid CNN–Transformer framework significantly outperforms conventional CNN, Vision Transformer (ViT), and classical deep learning models in terms of classification accuracy, precision, recall, F1-score, and computational efficiency. The proposed model achieves improved generalization capability by effectively combining local texture analysis with long-range contextual dependency learning. Furthermore, the framework enhances diagnostic interpretability through attention visualization mechanisms and adaptive feature learning. The findings indicate that hybrid CNN–Transformer architectures can provide highly efficient and scalable solutions for next-generation intelligent healthcare systems, computer-aided diagnosis, and automated clinical decision support applications.

 

Downloads

Published

2026-05-28

How to Cite

Yamashiro, W. (2026). Hybrid CNN–Transformer Networks for Intelligent Medical Image Diagnosis and Classification. Multidisciplinary Journal of Research in Engineering and Technology, 13(2), 48–54. Retrieved from https://journals.mriindia.com/index.php/mjret/article/view/3167

Issue

Section

Articles

Similar Articles

<< < 8 9 10 11 12 13 14 15 16 17 > >> 

You may also start an advanced similarity search for this article.