THE EFFECT OF CONTRAST ENHANCEMENT ON EPIPHYTE SEGMENTATION USING GENERATIVE NETWORK

: The performance of the deep learning-based image segmentation is highly dependent on two major factors as follows: 1) The organization and structure of the architecture used to train the model and 2) The quality of input data used to train the model. The input image quality and the variety of training samples are highly influencing the features derived by the deep learning filters for segmentation. This study focus on the effect of image quality of a natural dataset of epiphytes captured using Unmanned Aerial Vehicles (UAV), while segmenting the epiphytes from other background vegetation. The dataset used in this work is highly challenging in terms of pixel overlap between target and background to be segmented, the occupancy of target in the image and shadows from nearby vegetation. The proposed study used four different contrast enhancement techniques to improve the image quality of low contrast images from the epiphyte dataset. The enhanced dataset with four different methods were used to train five different segmentation models. The segmentation performances of four different models are reported using structural similarity index (SSIM) and intersection over union (IoU) score. The study shows that the epiphyte segmentation performance is highly influenced by the input image quality and recommendations are given based on four different techniques for experts to work with segmentation with natural datasets like epiphytes. The study also reported that the occupancy of the target epiphyte and vegetation highly influence the performance of the segmentation model.


INTRODUCTION
Computer vision systems developed for object identification task require large number of training images. CIFAR-10 is a large dataset with 60,000 colour images distributed over 10 unique classes and many studies have utilized this dataset for object detection and recognition tasks. Imagenet dataset consisting of thousands of images are used for testing the performance of various Deep Learning (DL) algorithms for image classification and object identifications. A medical application developed by Abedella et al. (2021) used a chest Xray dataset consisting of 15,250 images for segmenting pneumothorax using U-Net with EfficientNet and ResNet architectures. Kitty vision dataset (180 GB) is used for object detection and navigation applications (Geiger et al., 2013). Hammoudi et al. (2020) used "CheckYourMask" dataset (60k images) for validating correct wearing of mask to avoid spread of Covid 19. DL networks capture relevant from the training dataset at multiple scales. Hence training images must contain information about the target object in different perspectives such as look angle, varying light conditions, scale variants and all possible varieties. The less diverse, training images may be spatiotemporally correlated information about the target. The visual similarity among the image samples in dataset is the major drawback of such datasets and will prevent DL algorithms to derive unique and distinctive features. This scenario will also reduce the number of distinctive samples in test dataset. Shahinfar et al. (2020) analysed the effect of sample size and varieties of images for training a DL algorithm training for wildlife monitoring application. Including rich information about the target(s) during the training stage will lead to better learning (or training) of the DL network and their successful identification in the test images. Hu et al. (2021) reported that in addition to the number of the images used for training, their quality influences the learning of a DL network. Training the FasterRCNN and MaskRCNN network to identify and segment the weeds from drone images, was influenced by factors like pixel resolution, over exposure, Gaussian blur, motion blur and noise. Nazare et al., (2018) and Zhou et al., (2017) have reported the importance of the quality of images used for training DL networks. Hence the efficiency and reliability of the DL model is dependent on the number, diverse information content, and the quality of the input or training images. In general, higher the number (e.g., in few thousands), rich or diverse information content, and good quality will result in better learning.
However, collecting thousands of good quality images containing rich information of the target(s) is not feasible in some applications. Acquiring images of rare plants/animals, objects, or phenomena possess several challenges due to a host of reasons. For example, certain rare plants grow in select geographic locations that might be also difficult to access.
Similarly, acquiring digital images of some rare and endangered animals such as black Rhino, Giant panda and Indian Tiger is challenging. Acquiring thousands of images of high quality is not feasible for all applications.
Training DL network(s) to identify epiphytes is challenging due to the limited number of data available for training. These plants grow on other trees and in locations that are difficult to access. Botanists build temporary structures next to the target, use cranes and many uses manual climbing to the trees which is of more risk. Due to these reasons acquiring large number of good quality images are risky, time consuming, and economically not feasible. The epiphyte dataset is usually small in sample size (ranging from 100 to 200 images) per family due to limitations in image acquisition. In such scenarios, not including poor quality images from a small sample size is not feasible and it may affect the DL algorithm training.
Under these circumstances DL algorithms must be trained with fewer images with uneven quality. Alternatively low-quality images can be pre-processed to enhance their quality. The preprocessing steps are designed to suppress the undesired information and bring up relevant features for further processing and analysis. The pre-processing steps sometimes may add additional information rather than improving the quality of the images. Numerous image enhancement methods exist that mainly classified in two categories based on the level of enhancement they are a) global and b) local enhancement methods. The global techniques are fast and simple and are suitable for overall enhancement of the images. In local enhancement, a small window slides through every pixel of the input image sequentially and only those pixels are enhanced which comes under the window. The local enhancement is more effective in terms of image quality, but it is time consuming.
The objective of this study was to assess the performance of conditional generative adversarial networks (CGAN) trained with images pre-processed by four different image contrast enhancement methods. These methods included one global enhancement techniques named Histogram Equalization (HE) and three local enhancement methods named Adaptive Histogram Equalization (AHE), Dynamic Histogram Equalization (DHE) and Exposure Fusion Framework (EFM).

Effect of Image Enhancement on CNN
The convolutional neural networks (CNN) are widely used in various computer vision applications like image classification, segmentation, and recognition tasks. The CNN networks derive the best features from a good quality dataset. The image preprocessing techniques are widely applied to improve the image quality prior to CNN training. Rodríguez-Rodríguez et al. (2021) and Chen (2019) reported that enhancement methods can improve CNN learning while few methods may reduce the same. The enhancement method intended to improve the suppressed information present could sometimes introduce distortions to images. Li et al., (2021) investigated the effect of image contrast enhancement while training a VGG16 and FastRCNN CNN network for pistol detection. Aravind et al., (2020) optimized the CNN network for a road lane detection application. Apart from this study Rahman et al., (2021) analysed the effect of image enhancement while training a chest x-ray segmentation model using Unet. Setiawan and Agung (2020), Dimililer et al., (2016) and Xu et al., (2020) reported the effect of image enhancement while training the CNN based networks.

Conditional GAN
In the proposed study, the contrast enhanced epiphyte images are trained using a variant of generative adversarial networks (GAN) called Conditional GAN. GAN's are generative models comprised of two neural networks called Generator and Discriminator (Goodfellow et al., 2020). The generator network captures the data distribution, and the discriminator network estimates the probability of data from training set rather than generator. In an unconditional GAN there is no control on the modes of the data generated. A normal GAN conditioned with input data like class labels and there by direct the data generation process (Mirza et al., 2014). Conditional GAN is widely used for many image generation applications. Shashank et al., (2020) used CGAN for identifying epiphytes from drone photos where the CGAN will generate the output label for input image. Jiao et al., (2019) evaluated the potential of the CGAN for plant leaf recognition. The capability of CGAN to generate new data from existing data paves the way for data augmentation in many image processing applications for training deep learning algorithms. Zhu et al., (2018), Douzas and Bacao (2018), Bird et al., (2022) and Sanjay et al., (2021) have described data augmentation technique applied on various data.
Apart from the data generation capability of the conditional GANs, they can generate good data with limited number of training samples. A conditional GAN based image to image translation model named pix2pix shows the potential of the method to work with less training data and iterations Isola et al., (2017). This study addressed the image data generation case studies in synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images. The potential of such an image-to-image translation model for epiphyte segmentation with less training samples are studied by our research team (Shashank et al., 2020. The proposed study focused on the impact of quality of training data in terms of image contrast while training a CGAN network is addressed.

The Epiphyte Dataset
The Epiphyte dataset used in this study were acquired from Braulio Carrillo National Park in Costa Rica (Sivanpillai et al., 2019). DJI Phantom and DJI Spark drones were used for collecting RGB images. The collected images were grouped into different categories based on the species present in the images. For this study, our research group selected Werauhia Kupperiana for segmentation task. The team has developed a conditional generative adversarial network (CGAN) trained model for segmenting the selected species from other background vegetation. The sample input image and corresponding annotation is shown in Figure 1. All images used for training the Conditional GAN based architecture were pixel wise annotated to segregate the target epiphyte and other background vegetation. The annotation for the input images were completed by experts using open-source annotation tool called LabelMe (Torralba et al., 2010).

Selecting Low Contrast Images
The original dataset used for training contained 119 images of target epiphyte. The model was trained with mix of low contrast and high contrast images. The number of low contrast images were less compared to the optimal contrast images. The initial steps involved in this study was to identify the low contrast images from the dataset and enhance the contrast. After enhancing the low contrast images, a new segmentation model was trained for analysing the effect of enhancement techniques. Low contrast images were selected using two steps: 1) manual visual inspection, and 2) analyse the histogram signatures of the manually selected images to understand the distribution of pixels. The manual visual inspection step involved categorizing the images where the target feature was shaded from nearby vegetation, occupancy of target and background, target occluded with nearby vegetation's and light intensity variation. Images were considered as low-quality if the pixels skewed more towards low values. The histogram signatures for the low contrast and optimal contrast image from the dataset is depicted in Figure 2.

Figure 2.
A sample low contrast image (a) and its optimal contrast image (c) and corresponding histogram signature for the low contrast (b) and optimal contrast (d) image.
Selecting low contrast images must be aided with manual inspection. Figure 2(a) is a low contrast image where the target plant and the background is dark. In Figure 2(b) the histogram values are skewed towards the low contrast region. The optimal contrast image presented in Figure 2 (c) has optimal contrast and with few dark background region pixels. This is evident in the corresponding histogram signature Figure 2(d) where the high contrast pixels are distributed between 100 and 250 (target) whereas the value between 0 and 100 are due to background low contrast regions. The manual visual inspection of images along with histogram signatures helps to identify the low contrast regions are due to target plant or other background vegetation. This process was used for selecting 23 low contrast images from the epiphyte dataset (n = 119) while the remaining 96 were identified as optimal contrast images.Using these parameters, a synthetic image is generated which is wellexposed in the regions where the original image is underexposed. Finally, the synthetic image and the original input image are fused to obtain the enhanced output (Ying et al., 2017).

Image Quality Enhancement
Histogram based contrast enhancements are widely used in many computer vision application for contrast enhancement (Harichandana et al., 2020). The histogram-based methods used in this study act at global and local regions of the image while enhancing the low contrast images (Hussain et al., 2018, Patel et al., 2013. The exposure fusion framework is more robust in terms of enhancing the contrast at global level by considering the camera parameters while processing. More details of the enhancement algorithms are discussed in following subsection.

Histogram Equalization (HE):
The histogram of an image shows the distribution of its pixel intensity values. The histogram of a low contrast images has more intensity value towards the low values. HE method applies global enhancement and equally distributes the intensity values in an image. The low contrast epiphyte images have both the target epiphyte and background vegetation contributing towards low intensity values or in some cases the target plant regions are dark, and background is bright. Since HE is a global contrast operation on the image both the subjects in our epiphyte images are enhanced (Patel et al., 2013, Hussain et al., 2018.

Adaptive Histogram Equalization (AHE):
The adaptive histogram equalization is an enhanced version of HE. This method divides the whole region to individual regions based on intensity variations and compute histograms. The AHE focuses on the individual local regions of the input image to enhance contrast, and results in improved edge definitions and isolation of varying contrast regions (Pizer et al., 1987).

Dynamic Histogram Equalization (DHE):
The Dynamic Histogram Equalization method reduces the information loss during enhancement compared to HE. DHE method computes the local minima for isolating the variations in histogram regions and assigns a grey level range for each division before enhancing the pixel values. The DHE has less anomalies on the processed images such as washed-out appearance, chequerboard effects and adverse artefacts (Abdullah-Al-Wadud et al., 2017).

Exposure Fusion Framework Method (EFM):
Enhancement methods discussed in previous subsections work directly with the pixels values to correct the anomaly. The anomaly can also happen due to external factors like lighting conditions and influence of nearby objects etc. These external factors change the values of the pixel from its natural intensity values. To precisely correct the pixels, we also need a model which considers the inputs from our image acquisition hardware. Though images were acquired using various devices which differ in specifications, we have common enhancement techniques for all these images. The exposure fusion framework utilizes the illumination estimation techniques and camera response model during synthesis of images.

Training CGAN with Enhanced Images
The conditional generative adversarial network architecture implemented in earlier study (Shashank et al., 2020) is used to train the new models with images enhanced by four different methods and a model without any enhancement. The input to the architecture is a pair of images with input image and corresponding annotated image. The objective of the generator network is to learn generating the label images named as fake samples and the discriminator network is responsible for discriminating the samples from generator as real or fake as depicted in Figure 3. After enhancing the selected 23 low contrast images they were combined with remaining 96 images to make the good quality set. Finally, out of 119 images 96 images were used for training, 13 images for testing and 10 images for validation during training. To evaluate the effect of enhancement 4 images were included in test, 18 in train, and 1 in validation. The epiphyte segmentation architecture network is designed in such a way that it forces the network to generate label /segmented images from input images. Both the network have the conditional parameter as the real input image depicted as 'y' in Figure 3 and the corresponding label. The training and details of the CGAN network used for segmentation is given in (Shashank et al., 2020).

Structural Similarity Index Measure (SSIM):
The structural information evaluation of predicted epiphyte segmented labels is crucial to understand whether the learned model preserved such information in prediction values in images carry important. SSIM is widely used for image comparisons. The high dependencies of intensity structural information which we see in the visual scenes.
The SSIM measurement is achieved with 3 major component comparisons: luminance, contrast, and structure. The SSIM values ranges between -1 to 1 and a value close to 1 indicate images are similar (Wang et al., 2004).

Intersection over Union (IoU):
Intersection over union is another measure used for comparing the ground truth mask and the predicted mask. In this study, the images contain two classes target epiphyte and background. In binary or multi class segmentation tasks the IoU is computed for each class followed by an average of classes. We computed the IoU score only for the target epiphyte class.

Enhanced Image Evaluation Method
To analyse the effect of each contrast enhancement method in the low contrast input images a similarity index measurement is done with input image and enhanced image. This technique finds out the major differences between the original image and the enhanced image. Later the modified regions are identified in enhanced images and marked with a bounding box. This methodology also generates a binary image of the input image with white pixels indicate the modified pixels and the black corresponds to untouched regions during enhancement. This helps to visualize the effect of enhancement in the low contrast images. The image difference evaluation is implemented with Opencv and Python. The inputs to the algorithm are original low contrast image and the corresponding enhanced image, later the difference and SSIM scores are computed for these images. From the difference and SSIM score, contours are identified for the regions which underwent changes, and a bounding box is drawn around to localize the region. This mode of evaluation was performed on all the test images enhanced with 4 different contrast enhanced methods to understand how pixel modifications during contrast enhancement affected the model learning.

Contrast Enhanced Images and Histogram
The four contrast enhancement techniques were applied to the 23 low contrast images identified from the epiphyte dataset. Figure 4 shows the enhanced images for a sample low contrast image and its corresponding histogram signatures after enhancement with four different techniques. The original input image without enhancement shown in Figure 4 in column 1 had left skewed histogram signature represents the dark pixels and few pixels are in right side indicate the background brighter pixels.
The histogram equalization method stretched the pixels in all ranges and the pixels are modified globally throughout the image. The AHE algorithm acted in local level where the dark pixels in original image scaled up by small value and the brighter pixels in original image become much brighter. AHE altered the original histogram by shifting more pixels towards higher intensity values. The AHE was influenced by high intensity pixels in original image which forced the algorithm to shift the epiphyte dark region to higher values. DHE was able to bring a clear separation between the dark and brighter pixels in the image. EFM enhancement was global, and its signature was squeezed towards the centre. EFM applied failed to separate the target from the background.

CGAN Epiphyte Segmentation Model Performance
Five CGAN models trained with low contrast and 4 different enhanced images were evaluated with the test images. Average SSIM and IoU scores for all test set images tested with 5 models were computed (Table 1).

DISCUSSION
To understand the overall performance and the importance of such image enhancement methods while working with natural datasets like epiphytes, the following section gives details of the findings observed during this study.

Findings from Enhanced Image Evaluation
As per the SSIM and IOU scores (Table 1) the average scores were higher for DHE method, and this method performed well for image numbers EPI-3 and EPI-4 (Table 2). Figures 5 and 6 show the similarity measure representations and masks which highlights the region undergone for changes for different enhancement techniques. Enhancement image evaluation was performed on 4 low contrast images which are listed under test set. The enhanced image evaluation categorized the 4 low contrast images to 2 categories, 1) Images where both the target epiphytes and other vegetation's are equally occupying the frame and 2) Images with epiphytes mostly occupied in frame and background vegetation is faraway. The images EPI-1 and EPI-2 falls under category 1 and images EPI-3 and EPI-4 falls under category 2.

Effect of contrast enhancement
The contrast enhancement algorithms make modifications to the pixels. The enhancement algorithms make changes to images at either to one or more local regions in each image or globally throughout the image. The input images for enhancement may have varying pixel intensities at different regions. The enhancement algorithm act at local level to ensure that the enhanced pixels remain to its original distribution. On the other hand, enhancement algorithms act at global level enhances the pixels uniformly throughout the image.

Insights from enhanced image evaluation
DHE performed well for two images where the masks show minimum modifications to pixels ( Figure 5). Moreover, pixel modification occurred on target epiphyte regions resulting in its clear separation from the background. Few regions undergone changes in images while DHE method is applied compared to other methods. Two test input images were selected such that the target epiphyte was more focused while the background was out of focus. From the input image we can notice that there is a clear separation between the target and background and the same was preserved after enhancement by DHE. Similarly, as we can observe that the other methods are having similar performance for the other two test images under category 2 depicted in Figure 6. Figure 6 shows images with target and the background vegetation equally focused and not clearly separated from each other unlike input images in Figure 5. From the binary mask generated for the changes undergone after contrast enhancement for these images shows that the pixel modification has happened for both target and background vegetation. There was a high overlap between the modified pixels of target plant and background vegetation in the enhanced images. This analysis also reveals that the trained model for target epiphyte segmentation can result in less IOU and SSIM scores if the contrast enhancement algorithms fail to separate the target and background vegetation for low contrast images. Also, if the input images are under low contrast and both the target epiphyte and background vegetation equally occupy the frame then any enhancement method which brings high overlap between these classes after pixel modification will result in low SSIM and IOU scores.
Further improvements are possible with enhancement techniques which brings the clear separation between the target and background to be segmented. Enhancement algorithms which can localize regions with low contrast and later assisted with guided methodology could bring much improvement. Further, DL based image enhancement techniques can be identified to bring improvements. To increase the image samples for train, test and validation set deep generative adversarial augmentation techniques can be employed in future.

CONCLUSIONS
Quality of input images used for training affects how CGAN learns to identify the epiphytes. Presence of low contrast images in small datasets can impair the network's learning process. Contrast enhancement techniques can be used for improving the quality of such images. Images enhanced with DHE scored comparatively higher IOU and SSIM score while segmenting epiphytes. Apart from image contrast, occupancy of target (epiphytes) and background vegetation in an RGB image plays an important role. From the four enhancement algorithms evaluated in this study, DHE performed well for low contrast images with good target occupancy. On the other hand, HE, AHE and EFM failed to increase the IoU and SSIM scores when the target (epiphytes) and background equally occupied an image. Contrast enhancement algorithms applied on epiphyte data should be able to bring a clear separation between the target and background vegetation. The datasets acquired like epiphytes may have uneven distribution of low contrast pixels in target and background classes. The uneven distribution of contrast in such images are challenging for the enhancement algorithm and there by fails to separate the background and foreground information.