EFFECT OF HYPERPARAMETERS ON DEEPLABV3+ PERFORMANCE TO SEGMENT WATER BODIES IN RGB IMAGES

: Deep Learning (DL) networks used in image segmentation tasks must be trained with input images and corresponding masks that identify target features in them. DL networks learn by iteratively adjusting the weights of interconnected layers using backpropagation, a process that involves calculating gradients and minimizing a loss function. This allows the network to learn patterns and relationships in the data, enabling it to make predictions or classifications on new, unseen data. Training any DL network requires specifying values of the hyperparameters such as input image size, batch size, and number of epochs among others. Failure to specify optimal values for the parameters will increase the training time or result in incomplete learning. The rationale of this study was to evaluate the effect of input image and batch sizes on the performance of DeepLabV3+ using Sentinel 2 A/B RGB images and labels obtained from Kaggle. We trained DeepLabV3+ network six times with two sets of input images of 128 x 128-pixel, and 256 x 256-pixel dimensions with 4, 8 and 16 batch sizes. The model is trained for 100 epochs to ensure that the loss plot reaches saturation and the model converged to a stable solution. Predicted masks generated by each model were compared to their corresponding test mask images based on accuracy, precision, recall and F1 scores. Results from this study demonstrated that image size of 256x256 and batch size 4 achieved highest performance. It can also be inferred that larger input image size improved DeepLabV3+ model performance.


INTRODUCTION
Machine learning and deep learning are both subfields of artificial intelligence (AI) that involve training networks to learn from data and make predictions or decisions (Mahesh et al., 2020). While they share some similarities, deep learning represents a more advanced and specialized approach within the broader scope of machine learning. On the other hand, Deep learning is a specific subset of machine learning that focuses on using neural networks with multiple layers to process and analyse complex data. Deep learning algorithms, also known as artificial neural networks, are inspired by the structure and function of the human brain.
The key difference between deep learning and traditional machine learning lies in the level of abstraction and feature engineering required. In traditional machine learning (ML), the task of manually identifying and extracting relevant features from the data falls upon domain experts, making it a time consuming and challenging process. Deep learning (DL), on the other hand, aims to automate this feature engineering step by directly learning hierarchical representations from raw data. DL algorithms leverage neural networks to extract and analyse features at various levels of abstraction, enabling them to capture complex patterns. However, it is important to note that in certain scenarios, DL algorithms may still require manual annotation or feature identification. By incorporating manual annotations, DL algorithms can leverage domain experts' knowledge and improve the efficiency and accuracy of the learning process. Therefore, while DL strives to automate feature extraction, there are instances where manual annotation remains essential for achieving optimal model * Corresponding author performance. Forward and backward propagation are fundamental processes in deep learning that enable the training of neural networks by iteratively adjusting the model's parameters to minimize the difference between predicted outputs and ground truth values. During forward propagation, input data is fed into the neural network, and computations are performed layer by layer, moving from the input layer to the output layer. Each layer consists of interconnected neurons that apply weighted sums and activation functions to produce outputs. The outputs from one layer serve as inputs to the next layer until the final output is generated. This forward pass allows the neural network to make predictions based on the current values of its parameters. Next, backward propagation, also known as backpropagation, takes place. Backward propagation calculates the gradients of the loss function with respect to the network's parameters (Li et al., 2012). It starts at the output layer and iteratively works backward through the network, adjusting the weights and biases based on the computed gradients. The gradients are computed using the chain rule of calculus, which allows for efficient computation of gradients at each layer. By propagating the gradients backward, the neural network updates its parameters in a way that reduces the difference between predicted outputs and the actual targets. This iterative process continues until the model converges to a state where the prediction accuracy is optimized, and the network has learned to generalize well to unseen data. To successfully train a Deep Learning network, an analyst must specify certain constraints or Hyperparameters. These Hyperparameters determine the behaviour and performance of a DL network. Specifying appropriate hyperparameters values is essential for achieving optimal model performance.

Hyperparameters
Hyperparameters are constraints that specify or limit the network's ability to execute forward and backward propagation steps (Hertel et al., 2020). The common hyperparameters include learning rate, batch size, image size, number of epochs, activation function, optimizer, regularization, and initial weights. There are different hyperparameters as listed below: 1.1.1 Learning rate: The learning rate is a critical hyperparameter that controls the step size of parameter updates during training. It influences the balance between convergence speed and accuracy. Choosing an appropriate learning rate is crucial to ensure efficient and effective model training.

1.1.2
Image size: Image size refers to the dimensions of the input images fed into the neural network (Probst et al., 2018). The choice of image size can significantly impact the performance of the segmentation model. Larger image sizes may capture more detailed information but can also increase computational requirements. On the other hand, smaller image sizes may sacrifice some details but can lead to faster processing times.

Batch size:
Batch size, on the other hand, pertains to the number of training samples processed together in a single step of the training process where a batch of training samples is processed together. It influences the trade-off between computational efficiency and model convergence. A larger batch size can accelerate training by parallelizing computations, but it may also require more memory resources. Conversely, a smaller batch size can provide more accurate weight updates but can be computationally expensive.

Number of epochs:
Number of epochs is a significant hyperparameter in training models. It determines the number of times the dataset is iterated during training. Finding the right balance is important, as too few epochs may result in underfitting, while too many can lead to overfitting. Proper selection requires experimentation and monitoring to achieve optimal model performance.

Activation Function:
Activation functions are essential hyperparameters in neural networks, introducing non-linearity. They affect the model's capacity to learn and approximate complex functions. Popular choices include sigmoid, tanh, and ReLU, each with unique properties that impact network performance. Proper selection and experimentation are crucial to maximize model capabilities.
In this study, we evaluated the role of image and batch sizes for training DeepLabV3+ (Chen et al., 2018), a DL network, for segmenting waterbodies in satellite images (Liu et al., 2021) (George et al., 2023). By investigating the effects of varying image sizes and batch sizes on water body segmentation, this study aims to optimize these hyperparameters for achieving accurate and efficient segmentation results. Understanding the impact of these hyperparameters can provide insights into the trade-offs between computational resources and segmentation performance (Yuan et al., 2021), ultimately contributing to the advancement of water body segmentation techniques in remote sensing and environmental monitoring applications.

Dataset Description
The dataset used in this research was obtained from Kaggle, a website that provides access to a diverse collection of open datasets. Specifically, the dataset comprises satellite images of water bodies along with their corresponding masks in greyscale format ( Figure 1). The satellite images were captured by the Sentinel-2 (Escobar, n.d.) satellite, a remote sensing platform used for Earth observation. The images are in RGB (Red, Green, Blue) format, representing the visual spectrum. Each image is accompanied by a black and white mask, where pixels in white color indicate water and black represents areas other than water. Normalized Difference Water Index (NDWI) was used for identifying water pixels in the image (McPheters, 1996). NDWI is a commonly used index for detecting water in satellite imagery. However, for this study, a higher threshold was employed to specifically detect and delineate water bodies. The images were pre-processed using Rasterio, a Python library for handling geospatial data. The preprocessing step ensured the compatibility and appropriate formatting of the satellite images for further analysis.

Data Pre-Processing
In the data pre-processing step of this work, quality control measures were implemented to ensure the accuracy and reliability of the dataset. This included identifying and removing images with incorrect masks, such as green vegetation or bare ground misclassified as water. Further, images containing turbid water or water with varying colors were excluded to ensure data uniformity. After this step only the good water quality images are retained.
Next, a significant emphasis was placed on changing the image sizes and creating subsets for training, validation, and testing. The original dataset consisted of varying image sizes, which were divided into smaller patches of size 128x128 and 256x256. This division into patches offered computational efficiency, localized analysis, and the ability to capture both fine-grained and contextual information, thereby enhancing the effectiveness of subsequent analyses and model training.

DeepLabV3+ for Water body Segmentation
Satellite images are widely used for water body extraction (Rithin Paul Reddy et al., 2018). The water body segmentation in this study was performed using the DeepLabv3+ architecture, a highly effective convolutional neural network (CNN) model for semantic segmentation (Chen et al., 2018). CNN based network is a potential approach for satellite image processing (Thanga Manickam et al., 2021). The training dataset consisted of image patches of sizes 128x128 and 256x256, allowing the model to capture both fine-grained details and contextual information.
During the training process, batch sizes of 5, 8, and 16 were used, with the corresponding masks guiding the learning process. The DeepLabv3+ architecture incorporates an encoder-decoder structure with atrous spatial pyramid pooling (ASPP) modules, enabling multi-scale feature fusion and accurate predictions on objects of different sizes (Sunandini et al., 2023). By utilizing dilated convolutions at multiple rates, the model captures features at various scales, while skip connections preserve fine-grained details. This combination of architecture and dataset facilitated precise and reliable water body segmentation.

Technical Requirements
The study required specific hardware and software configurations to facilitate the computational tasks involved in deep learning experiments. For hardware, the High-Performance Computation (HPC) Server was utilized. The server comprised a total of 5 nodes, with each node being equipped with 28 CPUs and 254 GB of memory. This hardware infrastructure provided substantial computational power and memory capacity necessary for processing large-scale datasets and training complex deep learning models. In terms of software requirements, Python, a widely adopted programming language in the field of machine learning and deep learning, was used as the primary programming language. Python offers extensive libraries and frameworks that supports various aspects of deep learning implementation. Furthermore, TensorFlow, a popular opensource deep learning framework, was employed as the core software tool. TensorFlow provides a comprehensive ecosystem for designing, training, and evaluating deep neural networks. Its scalability and flexibility made it well-suited for conducting the deep learning experiments in the research study.

Hyperparameter Configurations
To optimize the performance of the deep learning models for water body segmentation, (Tsai et al., 2020). specific hyperparameter configurations were employed in this study. Two key hyperparameters under consideration were the image size and batch size. The image size, chosen as 128x128 and 256x256, determined the input dimensions of the neural networks and played a crucial role in capturing the desired level of detail from the satellite images. By selecting appropriate image sizes, a balance was achieved between computational efficiency and the ability to capture fine-grained features. The batch size, on the other hand, determined the number of training samples processed in each iteration during model training. For this study, batch sizes of 4, 8, and 16 were tested to investigate their impact on model convergence and computational efficiency. Different combinations of image size and batch size were examined, resulting in separate trained models for each combination. The deep learning models, specifically the DeepLabv3+ architecture, were trained for a fixed number of epochs, with 100 epochs chosen as the training duration. This configuration allowed the models to iterate through the training data, refining their weights and optimizing their performance over time. As a result, trained models were obtained for each combination of image size and batch size, serving as the outputs of the training process.

Evaluation Metrics
In assessing the performance of the water body segmentation models, multiple evaluation metrics were considered, including accuracy, precision, recall, and F1 score. Accuracy quantifies the proportion of correctly classified samples, taking into account both the water and non-water classes. A higher accuracy indicates a higher level of agreement between the predicted labels and the ground truth across all classes.

= + (1)
Precision measures the proportion of correctly classified water pixels out of all pixels predicted as water. It provides insights into the model's ability to correctly identify true water pixels, minimizing false positives.

= + (2)
Recall calculates the proportion of correctly classified water pixels out of all actual water pixels. It highlights the model's capability to capture all relevant water pixels, reducing false negatives.

= + (3)
F1 score is the harmonic mean of precision and recall, providing a balanced measure that considers both metrics. It combines precision and recall into a single value, reflecting the overall performance of the model in correctly identifying water pixels.

Accuracy Plots & Loss Plots
In the study, it was observed that the accuracy of the water body segmentation models consistently improved as the number of training epochs increased. The accuracy plots exhibited an upward trend, indicating that the models became progressively more proficient in accurately classifying water and non-water regions. This improvement in accuracy can be attributed to the models' ability to learn and capture the relevant features of water bodies over time. Additionally, it was observed that the loss plots showed a decreasing trend as epochs increased. The decline in loss values indicates that the models effectively minimized errors and discrepancies between the predicted and actual labels. As the models converged, the loss plot gradually flattened, suggesting that the models reached a state of stability and achieved optimal segmentation performance. Accuracy and loss plots for obtained when DeepLabV3+ was trained with images of dimension 128 x 128 (Figures 2 and 3), and 256 x 256 (Figures 4 and 5) show the importance of training the models over an adequate number of epochs to enhance accuracy and minimize loss, ultimately leading to improved water body segmentation results. Plots obtained for other parameter combinations are presented in Appendix (Figures 6 -9).         The F1-scores obtained are presented in the Table 5 From the results, it can be inferred that the F1 scores varied for different image sizes and batch sizes. For the 128x128 image size, the F1 scores ranged from 76.94% to 78.04% for batch sizes 4 and 8, respectively. Batch size 16 achieved a slightly lower F1 score of 76.95%. These scores indicate a reasonably balanced performance in terms of precision and recall for water body segmentation. For the 256x256 image size, the F1 scores ranged from 81.08% to 83.39% for batch sizes 8 and 4, respectively. Batch size 16 achieved an F1 score of 82.05%. These higher F1 scores suggest a relatively better balance between precision and recall for the larger image sizes.

Evaluation Scores
Increasing batch size does not consistently lead to improved metric scores. Specifically, for images sized 128 x 128 pixels, a batch size of 8 performs the best, while for 256 x 256-pixel images, a batch size of 4 yields optimal results. Notably, the combination of a 256 x 256 image size and a batch size of 4 consistently outperforms other configurations in terms of achieved metrics. These insights contribute to the ongoing exploration of efficient deep learning techniques and offer valuable guidance for enhancing model performance in imagerelated tasks.