Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be lightweight to be deployed for practical clinical applications. Therefore, it is desired to design a lightweight network with high performance for retinal layer segmentation. In this article, we propose LightReSeg for retinal layer segmentation which can be applied to OCT images. Specifically, our approach follows an encoder-decoder structure, where the encoder part uses multiscale feature extraction and a transformer block for fully exploiting the semantic information of feature maps at all scales and making the features have better global reasoning capabilities, while the decoder part, we design a multiscale asymmetric attention (MAA) module for preserving the semantic information at each encoder scale. The experiments show that our approach achieves a better segmentation performance compared with the current state-of-the-art method TransUnet with 105.7 M parameters on both our collected dataset and two other public datasets, with only 3.3 M parameters.
Lightweight Retinal Layer Segmentation With Global Reasoning
Yiming Wang;Fabio Poiesi;
2024-01-01
Abstract
Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be lightweight to be deployed for practical clinical applications. Therefore, it is desired to design a lightweight network with high performance for retinal layer segmentation. In this article, we propose LightReSeg for retinal layer segmentation which can be applied to OCT images. Specifically, our approach follows an encoder-decoder structure, where the encoder part uses multiscale feature extraction and a transformer block for fully exploiting the semantic information of feature maps at all scales and making the features have better global reasoning capabilities, while the decoder part, we design a multiscale asymmetric attention (MAA) module for preserving the semantic information at each encoder scale. The experiments show that our approach achieves a better segmentation performance compared with the current state-of-the-art method TransUnet with 105.7 M parameters on both our collected dataset and two other public datasets, with only 3.3 M parameters.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.