RSB-MedNeXt: An attempt at beating the STU-Net through Robust Stem and Bottleneck Design
Keywords: Medical image segmentation, STU-Net, MedNeXt, U-Net
Abstract. Medical image segmentation is a crucial task that supports clinical diagnosis and treatment planning. This field was revolutionized in both theoretical and practical aspects due to the employment of deep learning, specifically U-Net and its variants. Recently, with the aim of improving scaling and transferable capabilities, which are the drawbacks of U-Net, STU-Net, and other similar works were released. As a result, this led to significant advancements in medical applications practically. However, STU-Net trades efficiency for performance disproportionately, resulting in huge fine-tuning costs to achieve improvement over training from scratch. In this paper, we systematically identify architectural strengths and limitations of STU-Net and MedNeXt that hinder optimal feature learning. Through this analysis, we propose RSB-MedNeXt, a more robust CNN architecture designed to surpass STU-Net while maintaining efficiency. Our architecture introduces two key innovations: (1) a robust stem module with three parallel branches that extract information at multiple scales, (2) a hybrid bottleneck that combines CNN-based feature extraction with self-attention mechanisms to capture both fine-grained details and global context. We integrate our network into the nnU-Net framework and conduct comprehensive experiments on multiple segmentation tasks against STU-Net and MedNeXt. Results demonstrate that RSBMedNeXt achieves superior performance while requiring fewer computational resources than STU-Net. Through our approach, we hope that the trade-off between performance and efficiency in medical image segmentation can be effectively addressed and offers a promising method in resource-constrained clinical applications.