Hybrid CNN–Transformer Networks for Intelligent Medical Image Diagnosis and Classification

Wanchai Yamashiro

Authors

Wanchai Yamashiro Department of Computer Science and Engineering, Shiraz College of Systems and Management, Iran

Keywords:

Hybrid CNN–Transformer Medical Image Diagnosis Medical Image Classification Deep Learning, Vision Transformer Intelligent Healthcare Systems

Abstract

Deep learning approaches, particularly Convolutional Neural Networks (CNNs), have demonstrated remarkable success in extracting local spatial features from medical images such as X-rays, CT scans, MRI scans, histopathological slides, and ultrasound images. However, CNN-based architectures exhibit limitations in capturing long-range dependencies and global contextual relationships within medical images. Recently, Transformer-based architectures have emerged as powerful models capable of learning global contextual representations using self-attention mechanisms. Nevertheless, pure Transformer models often require large-scale datasets and high computational resources, which limit their effectiveness in medical imaging applications with limited annotated data. This research proposes a hybrid CNN–Transformer framework for intelligent medical image diagnosis and classification by integrating the local feature extraction capability of CNNs with the global contextual learning ability of Transformer networks. The proposed architecture utilizes convolutional layers for hierarchical spatial feature extraction followed by Transformer encoder modules for contextual representation learning and attention-based feature optimization. The framework incorporates adaptive attention mechanisms, feature fusion strategies, and intelligent classification layers to improve diagnostic accuracy and robustness across multiple medical imaging modalities. Experimental evaluation is performed using benchmark medical image datasets for disease classification, tumor detection, and abnormality identification tasks. The experimental results demonstrate that the proposed hybrid CNN–Transformer framework significantly outperforms conventional CNN, Vision Transformer (ViT), and classical deep learning models in terms of classification accuracy, precision, recall, F1-score, and computational efficiency. The proposed model achieves improved generalization capability by effectively combining local texture analysis with long-range contextual dependency learning. Furthermore, the framework enhances diagnostic interpretability through attention visualization mechanisms and adaptive feature learning. The findings indicate that hybrid CNN–Transformer architectures can provide highly efficient and scalable solutions for next-generation intelligent healthcare systems, computer-aided diagnosis, and automated clinical decision support applications.

Hybrid CNN–Transformer Networks for Intelligent Medical Image Diagnosis and Classification

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Similar Articles

Quick Links

For Authors

For Reviewers

Contact Us

Similar Articles

NEW CLASSIFICATION AND FILTERING TECHNIQUE FOR SPAM MESSAGE OF ONLINE SOCIAL NETWORK (OSN)

NEW CLASSIFICATION AND FILTERING TECHNIQUE FOR SPAM MESSAGE OF ONLINE SOCIAL NETWORK (OSN)

Hybrid Metaheuristic Optimization Techniques for Cloud Task Scheduling and Resource Management

Smart Campus Club Platform for Event Coordination and Student Engagement

Interactive Studying for Nursery Kids and Illiterate from Rural India: AR- based App

DATA MINING FOR MALICIOUS CODE DETECTION SYSTEM

DigiGram: An AI-Powered Digital Governance Platform for Rural Panchayats in India

Sentiment Analysis for Mental Health

Challenges in AI Development: A Multi-Dimensional Study of Risks and Solutions