Publication Details
Abstract
Wireless capsule endoscopy (WCE) produces long, variable-quality video streams in which early and reliable polyp detection is critical. We present YOLO-InceptionResNet-A, a lightweight object detector that replaces the standard YOLOv4-tiny backbone with an Inception-ResNet-A block to enrich multi-scale feature representation while preserving real-time efficiency. The proposed pipeline operates in two stages: (i) a frame-level screening classifier to filter normal/abnormal images, and (ii) the detector for precise polyp localization. To respect clinical color sensitivity, we adopt conservative, clinically aware augmentation (brightness and mild hue jitter), alongside standard normalization. We evaluate on the Kvasir family of WCE images using patient-level splits and report object-detection metrics (mAP@0.5, mAP@[.5:.95], precision/recall/F1, and IoU), frame-level classification metrics (AUROC, sensitivity, specificity), and throughput on a single RTX 3090 GPU. Across benchmarks, our backbone swap consistently improves detection mAP and recall over YOLOv3, YOLOv4, and YOLOv4-tiny baselines, while maintaining low latency suitable for real-time review. Ablation studies isolate the contributions of the Inception-ResNet-A backbone and the augmentation policy, demonstrating that richer multi-scale features are the primary driver of the gains. We discuss limitations related to dataset size and domain shift, and outline external validation on additional WCE datasets as future work. These results indicate that targeted backbone re-architecture can deliver lightweight yet precise WCE polyp detection without sacrificing speed—an attractive trade-off for clinical deployment.