BUILDING EDGE DETECTION FROM VERY HIGH-RESOLUTION REMOTE SENSING IMAGERY USING DEEP LEARNING

: Detection of Building edges is crucial for building information extraction and description. Extracting structures from large-scale aerial images has been utilized for years in cartography. With commercially available high-resolution satellites, many aerial photography usages can now employ satellite imagery. Edge detection is focused on pinpointing distinct transitions between greyscale image regions and attributing their origins to underlying physical processes. Detecting building boundaries from very high-resolution (VHR) remote sensing data is essential for many geo-related applications, such as urban planning and management, surveying and mapping, 3D reconstruction, motion recognition, image registration, image enhancement and restoration, image compression, and more. The rapid evolution of convolutional neural networks (CNNs) has led to substantial breakthroughs in edge detection in recent years. Sharp, localized changes in brightness characterize edges in digital images. In most cases, edge detection requires some kind of image smoothing and separation. Differentiation is an ill-conditioned problem, and smoothing leads to information loss. It is challenging to create an edge detection method that works everywhere and adapts to any future processing stages. Therefore, throughout the development of digital image processing, numerous edge detectors have been created, each with its own unique set of mathematical and algorithmic properties. Several edge detectors have been developed due to application needs and the subjective nature of edge definition and characterization. We propose a deep learning technique, particularly convolutional neural networks(CNNs), that offers a promising approach to automatically learn and extract features from very high-resolution remote sensing imagery, leading to more accurate and efficient building edge detection.


INTRODUCTION
Building extraction from high-resolution remote sensing data plays a crucial role in urban mapping, infrastructure development, disaster response, etc. (Prabhakar, 2023.). Accurate and efficient detection of building edges is essential for tasks such as building footprint delineation, change detection, and 3D modeling (Giannarou & Stathaki, 2011). Traditional methods for building edge detection often rely on handcrafted features and heuristics, which can be limited in their ability to handle complex building structures and varying environmental conditions (Reda & Kedzierski, 2020).
Edge detection aims to identify and locate boundaries between different objects or regions in an image. It is an essential step in many image processing and computer vision tasks, as it provides crucial cues for subsequent analysis and interpretation (Ziou & Tabbone, 1998). Traditional edge detection algorithms, such as Canny, Sobel, and Robert operators, have been widely used in various domains. However, these methods often struggle to handle very high-resolution remote sensing imagery due to its complex spatial structures and fine details (Wen et al., 2021).
To overcome the limitations of traditional approaches, recent research has turned to deep learning techniques, particularly Convolutional Neural Networks (CNNs), which have demonstrated remarkable performance in various computer vision tasks. CNNs are ideally suited for analyzing remote sensing imagery due to their ability to capture and learn complex ______________________________ * Corresponding author spatial patterns and hierarchical representations (Wen et al., 2023). They excel at automatically extracting discriminative features from raw data, making them highly suitable for edge detection tasks (Lu et al., 2018). By leveraging large-scale annotated data sets and hierarchical feature extraction, deep learning models can effectively capture fine-grained details and spatial relationships, enabling accurate and robust building edge detection (Xia et al., 2021).
The application of deep learning in building edge detection has witnessed remarkable progress driven by advancements in network architectures, training strategies, and the availability of high-resolution remote sensing data. Several deep learning-based approaches have been proposed, ranging from fully supervised methods to weakly supervised and unsupervised techniques (Wen et al., 2021). These approaches often employ CNNs, recurrent neural networks (RNNs), or a combination of both to learn discriminative features and spatial dependencies for building edge detection (Xia et al., 2021).
Despite the significant achievements, challenges and research opportunities remain in the field of building edge detection using deep learning. One major challenge is the presence of complex building structures, such as irregular shapes, varying roof types, and occlusions caused by vegetation or other structures (Lu et al., 2018). Effectively capturing the intricate details and boundaries of these buildings requires the development of deep learning architectures that can handle such complexity.
Another challenge is the limited availability of high-quality annotated datasets for training deep learning models specifically tailored for building edge detection (Wu et al., 2023). Collecting and annotating large-scale datasets with accurate building edge labels can be time-consuming and resource intensive. Exploring techniques such as transfer learning, domain adaptation, and active learning can help overcome this challenge and enable the development of more robust and generalized deep learning models (Xu et al., 2018) (Li & Dong, 2022).
Moreover, the interpretability and explainability of deep learning models for building edge detection remain important research areas. As deep learning models often function as black boxes, understanding the reasoning behind their predictions and incorporating domain knowledge (Montavon et al., 2018) becomes crucial for building trust and confidence in the results.
In this study, we propose a novel approach for building edge detection from very high-resolution remote sensing imagery using CNNs. Our objective is to develop a robust and accurate edge detection model that can effectively handle the unique challenges posed by high-resolution satellite or aerial imagery. By leveraging the power of CNNs, we aim to overcome the limitations of traditional algorithms and significantly enhance the quality and precision of edge detection results.
The key contributions of our work can be summarized as follows: 1. Designing a tailored CNN architecture specifically optimized for edge detection in very high-resolution remote sensing imagery.
2. Collecting and curating a large-scale dataset of highresolution remote sensing images annotated with ground truth edge maps for model training and evaluation.
3. Conducting extensive experiments and evaluations to assess the performance of our approach.
4. Demonstrating the practical utility of our model through real-world applications, such as urban feature extraction, land cover classification, and change detection.
By addressing the challenges of edge detection in very highresolution remote sensing imagery and leveraging the capabilities of CNNs, our research aims to advance the field of remote sensing image analysis and provide valuable insights for decision-making processes in various domains. The proposed model has the potential to enhance the accuracy and efficiency of remote sensing applications, contributing to improved resource management, urban planning, and environmental monitoring.
The rest of the paper is organized as follows: Section 2 provides an overview of related work and existing deep-learning methodologies in building edge detection. Section 3 discusses the data acquisition and preparation. Section 4 presents the advantages and disadvantages of edge detection. Section 5 highlights the methodology adopted. Section 6 discusses the experiments and results achieved. Finally, Section 7 concludes the paper and summarizes the key findings.

Building Extraction from VHR Images
Various traditional algorithms and theories have been employed for building detection, but they face practical challenges in their extraction outcomes (Lu et al., 2018). The impressive performance of convolutional neural networks (CNNs) in tasks like object classification, object extraction, semantic segmentation, and edge detection has sparked the interest of researchers in exploring deep learning-based methods for building extraction (Ji et al., 2019). They introduced a scale robust CNN structure to extract buildings from high-resolution aerial and satellite images. Their approach involved utilizing two dilated convolutions on the initial two lowest-scale layers to expand the field-of-view and incorporate semantic information from large buildings. They also employed a multi-scale aggregation strategy to enhance segmentation accuracy. In another study, (Hu and Guo, 2019) treated each building as an independent entity and employed mask scoring R-CNN, along with a mask intersection-over-union (IoU) head, to assess and improve the quality of the masks. This approach demonstrated better building extraction results compared to mask R-CNN. (Lu et al., 2018) utilized the RCF method to generate an edge strength map for buildings and integrated the concept of geomorphology to optimize the building's edges. Additionally, (Reda and Kedzierski, 2020) proposed the FER-CNN algorithm, which enhanced the accuracy of building detection and classification.
They also incorporated the Ramer-Douglas-Peucker (RDP) algorithm to refine the shape of the detected buildings. While pixel-level assistance is important for segmentation, the ultimate goal is accurate object identification with precise boundaries (Hossain & Chen, 2019).

Deep Learning-Based Edge Detection
Edge extraction represents a fundamental challenge in computer vision, and the advancements made by CNN-based edge detection algorithms have significantly surpassed the performance of traditional computer vision methods (Bischke et al., 2019). The holistically nested edge detection (HED) network addresses the issue of blurred edges in natural images by leveraging deep supervision to automatically learn rich hierarchical feature representations (Xie & Tu, 2017).
However, HED solely utilizes CNN features from the last layer of each convolutional stage, limiting its ability to fully exploit the rich feature hierarchy of CNNs. To overcome this limitation, the RCF network employs a fully convolutional network (FCN) to effectively integrate features from all convolutional layers, leveraging the multi-scale and multi-level information of the target (Liu et al., 2019). Dynamic feature fusion (DFF) recognizes the importance of multi-scale feature fusion in semantic edge detection. Through a weight learner, DFF adaptively assigns fusion weights to different images and positions, resulting in more accurate and clearer edge prediction results compared to fixed-weight fusion methods .
To further enhance the detection effect of target edges at different scales, the BDCN introduces a bidirectional cascade structure. It provides layered edge supervision on specific scales for each layer and employs a scale enhancement module (SEM) to generate multi-scale features. These additions enrich the multiscale representation of shallow network learning, improving edge detection across various scales (He et al., 2022).
However, traditional deep learning methods often rely on a substantial number of manually labeled samples, which is both time-consuming and expensive. Hence, reducing the dependence of networks on labeled samples is of utmost importance.

DATA ACQUISITION AND PREPARATION
To conduct our research on building edge detection from very high-resolution remote sensing imagery, we utilized the ISPRS (International Society for Photogrammetry and Remote Sensing) dataset for the Toronto region. The ISPRS dataset is widely recognized and extensively used in the remote sensing community for benchmarking and evaluating various image analysis tasks, including edge detection.
The ISPRS dataset for the Toronto region consists of a collection of high-resolution aerial/satellite images captured by airborne sensors. These images provide a comprehensive coverage of the Toronto metropolitan area, enabling us to analyze and extract valuable information about the urban environment. The dataset includes images with diverse spatial resolutions, ranging from 0.1 to 1 meter per pixel, which allows us to evaluate the performance of our edge detection model under different resolution settings.
To prepare the dataset for our edge detection research, we followed a systematic procedure that involved the following steps: 1. Data Acquisition: We obtained the ISPRS dataset for the Toronto region from the official ISPRS archives. The dataset comprises a set of georeferenced orthorectified images, ensuring accurate spatial registration for further analysis. 2. Data Preprocessing: Prior to training our edge detection model, we performed various preprocessing steps to ensure the quality and consistency of the dataset. This involved removing any corrupt or incomplete images and applying radiometric and geometric corrections as necessary.

Annotation of Ground Truth Edge Maps: To train
and evaluate our edge detection model, we needed ground truth edge maps corresponding to the highresolution aerial images. The edge maps define the precise boundaries between different objects or regions in the imagery. The annotation process involved considering various visual cues, such as changes in color, texture, and intensity gradients, to accurately mark the boundaries. 4. Dataset Split: To facilitate training, validation, and testing of our edge detection model, we divided the dataset into distinct subsets. Typically, we allocated a certain percentage of the images for training, another percentage for validation to fine-tune model hyperparameters, and a final portion for testing to evaluate the model's performance on unseen data. The split was carefully designed to ensure that the distribution of images across the subsets was representative and balanced. 5. Data Augmentation: To increase the diversity and robustness of the training data, we applied data augmentation techniques. These techniques involve applying transformations such as rotations, translations, flips, and scaling to the original images and corresponding ground truth edge maps. Data augmentation helps prevent overfitting and improves the generalization capability of the edge detection model.
By following this dataset preparation process, we obtained a well-curated and annotated dataset of high-resolution satellite images for the Toronto region, along with ground truth edge maps. This dataset served as the foundation for training and evaluating our proposed edge detection model based on Convolutional Neural Networks (CNNs). The utilization of the ISPRS dataset provided a reliable and standardized benchmark to assess the performance of our approach and compare it against existing state-of-the-art edge detection methods in the remote sensing domain.

ADVANTAGES AND DISADVANTAGES OF EDGE DETECTOR
Edge detection plays a vital role in various image processing and computer vision tasks. It provides crucial information about object boundaries and spatial structures in an image. However, like any image processing technique, edge detection methods have both advantages and disadvantages. Understanding these strengths and limitations is essential for selecting the appropriate edge detector for a given application. Here, we discuss some of the key advantages and disadvantages of edge detection:

Advantages:
i. Object Localization: Edge detection algorithms excel at localizing and delineating object boundaries in an image. By identifying edges, these algorithms facilitate the extraction of important features for subsequent analysis, such as object recognition, segmentation, and tracking. ii. Feature Extraction: Edges carry important visual cues, representing changes in color, texture, or intensity gradients. By detecting edges, valuable information about object shapes, textures, and contours can be extracted, enabling further feature-based analysis and interpretation. iii. Image Understanding: Edge detection aids in enhancing image understanding by simplifying complex images into their essential components. By reducing the image to its edges, the focus shifts to salient features, facilitating tasks such as object recognition, scene understanding, and image classification. iv. Low-Level Vision Tasks: Edge detection is a fundamental step in many low-level vision tasks, including image denoising, image enhancement, and image restoration. Edges serve as building blocks for higher-level image processing algorithms and are essential for recovering important image details. v. Computational Efficiency: Many traditional edge detection algorithms, such as the Canny edge detector, are computationally efficient and can process images in real-time. This makes them suitable for real-time applications, such as robotics, autonomous driving, and video processing.

Disadvantages:
i. Sensitivity to Noise: Edge detection algorithms are highly sensitive to noise, which can result in the detection of spurious edges or the loss of weak edges. Noisy images or images with low contrast may pose challenges for edge detection methods, leading to inaccurate or incomplete edge maps. ii. Parameter Tuning: Many edge detection algorithms require careful parameter tuning to achieve optimal results. Selecting appropriate thresholds, smoothing parameters, or kernel sizes can be a challenging task and may vary depending on the characteristics of the image and the desired application. iii. Scale and Orientation Sensitivity: Traditional edge detection methods often struggle to handle scale and orientation variations. Detecting edges at multiple scales and orientations is crucial for analyzing complex images with objects of different sizes and orientations, which can pose challenges for conventional edge detectors. iv. False Positives and False Negatives: Edge detection algorithms may produce false positives or false negatives. Balancing between these errors is a delicate trade-off, and some algorithms may be more prone to one type of error over the other. v. Limited Semantic Information: Edge detection alone does not provide detailed semantic information about objects or regions in an image. While edges are useful for understanding shape and boundaries, they do not capture higher-level semantic attributes, such as object categories or scene context.
It is important to consider these advantages and disadvantages while selecting an edge detection method for a specific application. Modern deep learning-based approaches, such as Convolutional Neural Networks (CNNs), have shown promise in addressing some of the limitations of traditional edge detection methods. These techniques can leverage contextual information and learn discriminative edge features, potentially improving edge detection accuracy and robustness in various scenarios.

METHODOLOGY
The methodology for building edge detection from very highresolution remote sensing imagery using CNNs involves several key steps. Here, we outline the general framework for the proposed approach ( comprehensively. This ensures transparency and reproducibility of the research findings, allowing for further improvement and comparison with future studies in the field.

EXPERIMENT AND RESULTS
To evaluate the effectiveness of building edge detection from very high-resolution remote sensing imagery using CNNs, a series of experiments were conducted. The experiments aimed to assess the performance of the proposed methodology. Here is an overview of the experiment setup and the corresponding results analysis: 1. Dataset: • The ISPRS dataset for the Toronto region was utilized, consisting of very highresolution remote sensing imagery with annotated ground truth edge maps (Figure 2).

•
The dataset was divided into training, validation, and testing subsets in a predefined ratio (e.g., 70-15-15) to ensure reliable performance evaluation.

Model Configuration:
• A CNN-based architecture specifically designed for building edge detection was implemented (Figure 3 and 4).

•
The network architecture consisted of multiple convolutional layers, pooling layers, and upsampling layers, allowing for the extraction of hierarchical features.

•
Hyperparameters, including the number of layers, filter sizes, activation functions, and learning rate, were fine-tuned based on the validation set. 3. Training Process: • The model was trained using the training dataset with appropriate loss functions.

•
The training progress was monitored by tracking metrics such as loss and accuracy. 4. Performance Evaluation: • Quantitative metrics, including precision, recall, and F1-score were used to assess the model's performance. • These metrics were computed by comparing the predicted edges of the model with the ground truth edge maps from the testing dataset.

•
Qualitative assessment was conducted by visually comparing the predicted edges against the ground truth edges and the original remote sensing images. 5. Real-World Application: • The trained edge detection model was applied to real-world scenarios, such as urban feature extraction, land cover classification, or change detection.

•
The effectiveness of the model in these applications was evaluated, emphasizing its ability to provide valuable insights and support decision-making processes in remote sensing tasks.    In this study we have used the recall value, precision and F1 score parameters for evaluating the model's performance. The achieved results are shown in Table 1. The equations for them are as follows in eq. 1, 2 and 3: • Evaluation indices -Recall = + (1) Where TP represents the number of true positives, TN represents the number of true negatives, FP represents the number of false positives and FN represents the number of false negatives. It is important to note that further research and refinement are needed to address the specific challenges and limitations encountered in edge detection for very high-resolution remote sensing imagery. Continued exploration of advanced techniques and incorporation of domain-specific knowledge will contribute to the continuous improvement of edge detection methods in the context of remote sensing.