EFFECT OF DATA QUALITY ON WATER BODY SEGMENTATION WITH DEEPLABV3+ ALGORITHM

.


INTRODUCTION
Image segmentation, particularly semantic image segmentation, is a crucial task in computer vision. This process involves labelling each pixel in an image with a class, thereby providing a detailed understanding of the image at a granular level. Semantic image segmentation has broad applications ranging from autonomous driving to remote sensing (Subramanian et al., 2022) (Ramiya et al., 2016), medical image analysis, and robotics (Badrinarayanan et al., 2017) (Long et al., 2015) (Zhao et al., 2017).
One of the state-of-the-art architectures that have significantly impacted the field of semantic image segmentation is DeepLabV3+ (Chen et al., 2018) (Sunandini et al., 2023). This architecture combines the strengths of atrous convolutions, spatial pyramid pooling modules, and an encoder-decoder structure, thereby enhancing boundary detection and dealing with objects of different scales effectively (Yang et al., 2018) (George et al., 2023).
However, the performance of DeepLabV3+ like any other deep learning model depends on the quality of the training data. The more accurate and diverse the training data, the better the model's performance in segmenting images (George et al., 2023). This relationship is particularly noticeable in tasks such as segmenting water bodies from satellite images, which form the focus of this study (Harika et al., 2022).
Our objective is to delve deeper into this dependency on data quality. We investigate the influence of the quality of data while training the DeepLabV3+ algorithm for segmenting water bodies from satellite images. We scrutinize scenarios where there is a mismatch in the mask of the training data or when the training data consists of mixed water quality, such as clear and turbid water or water with floating vegetation. It is essential to understand whether excluding such cases would improve or degrade the model's performance (Jean et al., 2019) (Harika et al., 2022).
There is a general belief that cleaner and more consistent training data leads to better performance in machine learning models. However, when dealing with real-world scenarios, such as satellite image segmentation, data inconsistencies are often the norm rather than the exception. Hence, it is important to understand the behaviour of models like DeepLabV3+ under these circumstances, as they reflect realistic conditions that these models would encounter in actual deployments (Volpi and Tuia, 2016).
In the following sections, we present our findings, shedding light on the influence of data quality on the performance of DeepLabV3+ in the task of segmenting water bodies from satellite images.

Dataset
This study utilizes an open-source Kaggle dataset that contains satellite images and masks captured by the Sentinel-2A and Sentinel2B satellite. The images and the mask in the dataset have three bands of image: red (R), green (G), and blue (B), allowing to capture both the spectral and spatial information essential for analysis. Figure 1 represents the RGB image and the corresponding water class and no water class mask. with water class shown in white pixels (right).

Data Processing
Dataset consists of images and masks of varying sizes, to maintain uniformity and avoid bias in training the model, various sizes were resized to 256x256 pixels. This resizing ensures consistency in the data, allowing for efficient processing and analysis. The masks are recoded, where background pixels were labelled as 0, representing areas without water, while water pixels were labelled as 1, indicating the presence of water.
The dataset is split into three sets after being resized and recoded as training, testing, and validation, in a ratio of 80:10:10. The training set, includes 80% of the data, which is used to train the model. The testing set, includes 10% of the data, is used to evaluate how well the model performs on new, unseen images.
The validation set, also includes 10% of the data, is used to measure model's performance after each iteration and is helpful in adjusting the hyperparameters based on the feedback from the validation set. Splitting the dataset into separate sets allows for thorough model development and ensures that the model is tested on unbiased data to assess its ability to classify water effectively.

Data Refining
In the process of data refining, the objective was to enhance the performance of the model. To achieve this, three separate experiments are conducted, where each experiment involves training a new instance of DeeplabV3+.
These experiments aim to refine the data by incorporating various techniques such as removing images with mismatched masks and floating vegetation images. Only the pre-processing described earlier as resizing to 256x256 was performed in this experiment and no refinement of data was made.

Experiment 2:
From the original dataset, images with incorrect masks were removed to ensure that network learns from correct masks. After removing incorrect masks, the model with accurate training data will be helpful for giving better segmentation results. In Figure 2, it can be inferred that there is land in image, but NDWI (Normalized Difference Water Index) predicted land as water in mask.

Figure 2.
Sample RGB (left) image and the incorrect mask (right) where land was identified as water.

Experiment 3:
For the final experiment, images containing mixed water quality such as turbid water images and floating vegetation images were removed. This also resulted in addressing class imbalance. The dataset originally contained fewer samples for turbid and floating vegetation compared to images depicting good quality water. By removing these fewer samples, the dataset was refined to enhance the model's performance. Figure 3 and Figure 4 turbid water image and floating vegetation image with respective masks.

Selection of DeepLabV3+ Architecture for Water Body Segmentation
DeepLabv3+ has distinguished itself as a potent tool for semantic image segmentation, a technique that assigns every pixel of an image to a specific class label, facilitating intricate analysis of images. The model's application ranges from augmenting the perception capability of autonomous vehicles to improving the precision of medical diagnoses. The choice of DeepLabv3+ for this study lies in its exceptional architecture that accommodates various scales and contexts within images and its proficiency in capturing details regardless of scale, a characteristic that aligns with the properties of our dataset.
The architecture of DeepLabv3+ incorporates an advanced Atrous Spatial Pyramid Pooling (ASPP) module and an encoder-decoder structure. This combination aids in maximizing the contextual information derived from the images while ensuring that fine details are not lost. The foundation of this architecture is the Exception model, a high-performing convolutional neural network used to extract preliminary features from input images.
The ASPP module, the core of the DeepLabv3+ architecture, utilizes filters at diverse scales concurrently. This technique enables the model to capture contextual information from different sized areas in the image, an indispensable feature for our dataset, given the varying sizes of water bodies. Once this multi-scale feature extraction is complete, these features become inputs for the decoder.
The decoder's role in the DeepLabv3+ architecture is to refine the segmentation results using the extracted features. It up samples these feature maps to a higher resolution and combines them with earlier-stage feature maps from the network. This blend of high-level contextual information and detailed spatial information allows the decoder to generate precise and granular segmentation maps.
The output from DeepLabv3+ is a categorically labelled image, noted for its crisp object boundaries and comprehensive contextual understanding. The incorporation of state-of-the-art techniques such as depth-wise separable convolutions enhances its computational efficiency. As a result of this, and by leveraging the capabilities of the ASPP module and the encoder-decoder structure, DeepLabv3+ serves as an ideal choice for our study, providing a perfect blend of efficiency and accuracy for the task of water body segmentation.

Accuracy and Loss plots
Accuracy and loss plots provide valuable insights into the learning process of a model during training. As training progresses, the model adjusts its parameters to improve accuracy and reduce loss. With more epochs, accuracy improves as the model learns the data's patterns. Ideally, validation accuracy converges with training accuracy, indicating effective generalization. Loss decreases as the model's predictions align better with true values. Eventually, the reduction in loss flattens as the model captures most relevant patterns and further adjustments have diminishing returns.

Metrics
Relying solely on accuracy as the sole evaluation metric may lead to an incomplete assessment of model performance, particularly in scenarios involving imbalanced datasets. While accuracy considers both classes, precision, recall, and F1 score focus specifically on the target class.

TP+TN Accuracy= TP+TN+FP+FN
(1) Precision quantifies the proportion of correctly predicted positive instances relative to the total predicted positives, whereas recall measures the proportion of true positive (TP) instances identified by the model.

TP Precision= TP+FP
(2) TP Recall= TP+FN (3) F1 score harmonizes precision and recall into a single metric, providing a balanced assessment that accounts for both false positives (FP) and false negatives (FN).

2*Precision*Recall F1-Score= Precision+Recall
By incorporating precision, recall, and F1 score alongside accuracy, a more comprehensive and nuanced understanding of the model's performance within the target class can be attained, enabling more informed decision-making.

Experiment 1:
The accuracy plot obtained when DeepLabV3+ was trained with all images (n = 6422), shows a gradual increase in the accuracy around 90% ( Figure 5). However, the loss values shown in Figure 6 remain relatively high, indicating that the model faced some challenges in reducing its errors. The metrics obtained for this experiment are summarized in Table 2.

Experiment 2:
The accuracy plot obtained when DeepLabV3+ was trained after removing RGB images with incorrect masks (n = 1581), shows an improvement in the model accuracy ( Figure 7). The accuracy value steadily improved and reached approximately 95%.  However, the loss values remain relatively higher (Figure 8), indicating that the model continues to encounter challenges in minimizing its errors. In terms of evaluation metrics, a marginal change is observed in the results.

Experiment 3:
When DeepLabV3+ was trained with images of clear water bodies and correct masks, the accuracy and loss plots demonstrate remarkable improvement. Despite using fewer images, the overall accuracy surpassed 95% (Figure 9). This demonstrated that the model's ability to generalize improved when there was less variation in the training data. Moreover, the convergence of accuracy and loss closely matches the training accuracy, showcasing the model's robustness. This indicates enhanced learning capabilities compared to the previous experiments, evident in the gradual decrease in loss over time ( Figure 10). Precision score increased to 0.9792 from 0.4827 when RGB images of turbid water and floating vegetation were removed ( The enhanced performance of the model underscores the importance of dataset cleaning and its influence on training deep learning models for image segmentation tasks.

CONCLUSION
In this study, we investigated the influence of data refinement on model predictions. Our findings underscore the importance of data quality in improving model performance. Prior to refining the data, the model achieved a precision of 0.4692. However, after implementing data refinement techniques, including the removal of mismatched masks and mixed water quality images, notable improvements in precision were observed. The precision increased to 0.4827 after removing mismatched masks and further improved to an impressive 0.9792 after excluding mixed water quality images. These results highlight the significance of addressing data quality issues to enhance model accuracy and reliability. By prioritizing data refinement, researchers and practitioners can optimize model performance and minimize potential errors. Overall, this study underscores the crucial relationship between data quality and model predictions, emphasizing the need for meticulous data refinement to achieve more accurate and reliable results.