Hybrid CNN–Transformer Architectures for Computer Vision-Based Medical Image Segmentation
Main Article Content
Abstract
Medical image segmentation has become an essential component of modern healthcare systems, enabling accurate computer-aided diagnosis, treatment planning, disease monitoring, and clinical decision support. Precise segmentation of anatomical structures and pathological regions from imaging modalities such as Magnetic Resonance Imaging (MRI), Computed Tomography (CT), ultrasound, and histopathological scans is critical for improving diagnostic reliability and therapeutic efficiency. Conventional segmentation methods based on handcrafted features and classical machine learning algorithms often fail to perform effectively under conditions involving complex anatomical variations, low image contrast, noise, and irregular lesion boundaries. Although Convolutional Neural Networks (CNNs) have substantially advanced medical image analysis through automated feature extraction and hierarchical learning, their capability to capture long-range contextual dependencies and global spatial information remains limited. To address these challenges, this study proposes a Hybrid CNN–Transformer Architecture for Computer Vision-Based Medical Image Segmentation that combines the strengths of CNN-based local feature learning with transformer-based global contextual modeling. The framework integrates encoder–decoder architectures, self-attention transformer modules, multi-scale feature fusion, and skip connections to achieve accurate semantic understanding and precise boundary localization. By jointly exploiting convolutional operations for fine-grained texture extraction and transformer attention mechanisms for global dependency modeling, the proposed system significantly improves segmentation accuracy, robustness, Dice Similarity Coefficient (DSC), Intersection over Union (IoU), and boundary precision compared with conventional CNN-based approaches such as U-Net and FCN, particularly in challenging and noisy medical imaging environments.