Evaluating the Influence of Spatial Resolution on Landslide Detection: A Case Study in the Carlyon Beach Peninsula, Washington

: Landslides are geological events in which masses of rock and soil slide down the slope of a mountain or hillside. They are influenced by topography, geology, weather, and human activity, and can cause extensive damage to the environment and infrastructure, as well as delay transportation networks. Therefore, it is imperative to detect early-warning signs of landslide hazards as a means of prevention. Traditional landslide surveillance consists of field mapping, but the process is costly and time consuming. Modern landslide mapping uses Light Detection and Ranging (LiDAR) derived Digital Elevation Models (DEMs) and sophisticated algorithms to analyze surface roughness and extract spatial features and patterns of landslide and landslide-prone areas. This study follows a previous study performed that demonstrated that it is possible to detect unstable terrain using algorithmic mapping techniques. The focus of this study is to show how spatial resolution can influence the accuracy of the classification results. The DEM data was resampled from 6 to 12, 24, 48 and 96 ft spatial resolution. The surface feature extractors employed (local topographic range, local topographic variability, slope, and roughness) are fused and analyzed simultaneously by applying k-means and Gaussian Mixture Model (GMM) clustering methods. When compared with the detailed, independently compiled landslide reference map, our data shows a decrease in performance as spatial resolution decreases. These results suggest that spatial resolution does impact the performance of landslide classification.


INTRODUCTION
As the third-largest natural hazard, landslides are known to cause trouble throughout the world (Ahmed et al., 2018;Lu et al., 2011;Song et al., 2019) They are a type of mass wasting that encompasses numerous ground movements, including rock falls, severe slope failures, and shallow debris flows (Effat and Hegazy, 2014). They are the result of pre-conditional surface and/or sub-surface instability that follows slope changes, precipitation, or changes to the topography of the ground (Dalyot et al., 2008). Therefore, landslides typically happen on steep mountain or hillside slopes (McKean and Roering, 2004). In addition to natural considerations, landslides are influenced by human factors such as geography, climate, quarrying, and development. Since they severely harm the environment and infrastructure, landslides constitute a hazard to everyone on the planet. They are natural disasters that can destroy bridges, buildings, residential developments, and other locations where people live.
Landslides are caused by subsurface instability and encompass' a wide range of surface failures such as: debris flows, rockslides, and deep slope failures (Effat and Hegazy, 2014). Slope changes such as rainfall or topography fluxutions can cause the pre-condition surface and/or the sub-surface to become unstable resulting in a landslide (Dalyot et al., 2008). Steeper surfaces are more susceptible to slope change thus landslides are more prominently found on mountains or hillsides. Landslides can be affected by several natural processes and human intervention. Several examples of this are the natural topography of the land, weather, and terrain disturbances such as mining and construction.
Infrastructure and the surrounding area sustain severe damage from ongoing land movement. If a landslide occurs, there is a higher risk to human life near structures like highways, buildings, residential developments, bridges, and other heavily inhabited areas (Mora et al., 2015). For the purpose of recognizing and reducing the impacts of ground movement and protecting human life, accurate landslide mapping is essential.
Landslides have been mapped using photogrammetry, contour mapping, and field inspection technologies (McKean and Roering, 2004;Glenn et al., 2006;Booth et al., 2009;Mora et al., 2015). These techniques do not, however, offer sufficient spatial resolution or detail. Some of these techniques are quite accurate and precise, but they cannot map through thick vegetation, and they can also be expensive, challenging to use, time-consuming, and subjective (Booth et al., 2009;Mora et al., 2015).
In the development of modern mapping technology that makes use of remote sensing, traditional landslide mapping approaches are becoming less efficient. In the last ten years, both the spatial resolution and accessibility of remote sensing have significantly improved. Intricate data sets with spatial resolution of less than a meter is now possible thanks to airborne laser scanners, which capture remote sensing data. This information enables analysts to produce intricate surface models that can help in identifying places susceptible to landslides (Tarolli et al., 2012). Light Detection and Radar (LiDAR) has contributed to an increase in surface data resolution from 10 meters to less than 1 (Mora et al., 2015). The identification of small spatial characteristics is made possible by this improvement in resolution. Additionally, LiDAR can penetrate vegetation. The initial information obtained by laser scanners can be filtered to create bare earth models. Due to LiDAR and the bare earth terrain models, it generates thousands of square meters that could be mapped relatively quickly (Shan and Toth, 2008;Booth et al., 2009). LiDAR can be used to map minor failures in regions of sluggish mass movement and to distinguish between terrain subject to landslides and terrain not subject to landslides due to its higher spatial resolution and capacity to create realistic ground models (Jaboyedoff, 2012). LiDAR data may be useful for studies on ground surfaces and landslide vulnerability (Shan and Toth, 2008;Jaboyedoff et al. 2012). It has been discovered that LiDAR mapping is incredibly accurate, time and costeffective, and easily available for communities.
Digital Elevation Models (DEMs) can be used to map landslides, which greatly improves infrastructure safety and lowers the probability of landslides. Other elements, such as sediment type, water flow, and rock type, have been the subject of previous studies that have addressed how they may affect the stability of the area (Leshchinsky et al., 2015, Mora et al., 2015An et al., 2016). Terrain classification and analysis have become easier and more affordable due to technological approaches. Landslides can be found in DEMs and distinguished from the surrounding stable terrain using feature extraction filters (Glenn et al., 2006;Jaboyedoff et al., 2012;Mora et al., 2015). Landslide studies have made use of elements like slope, hillshade, variability, roughness, aspect, and statistics (Tran et al., 2019). Automated methods have been used to test several of the listed features (Cheng et al., 2013;Hölbing et al., 2015). The traits required to identify slides have been discovered to be individualized for each case (Dou, et al., 2015). For instance, compared to sandy, dry terrain, locations with lush flora will necessitate a distinct set of specialized traits to detect landslides. Different features are used to map and categorize land slide and non-slide zones. The ability to use feature extractors in DEMs depends critically on knowing certain information, such as the amount of vegetation, the slope angle, and the kind of soil (Sarkar et al., 2008).
This paper aims to build upon a prior study done on the unsupervised classification of Earth surfaces for landslide detection. The study has found that it is possible to identify land that is susceptible to landslides using unsupervised statistical classification of specific feature extractors. The extractors used were chosen based on a preliminary study that looked at how different feature extractors and the number of clusters used affected the accuracy of landslide identification. This paper explores how the spatial resolution of data influences accuracy. The original data provides a spatial resolution of 6 ft. This was downsampled to 12 ft, 24 ft, 48 ft, and 96 ft. There was a total of five sets of data for the same area, the only difference being spatial resolution. All feature extractors used in the prior study are combined in this analysis to ensure the only variable causing change is resolution. In addition, each data set was tested using both Gaussian Mixture Model (GMM) and K-means classification. Each statistical clustering method included several different numbers of clusters, resulting in 40 trials: four different clusters for five data sets and two clustering methods. The resulting data was compared to an independently compiled landslide inventory map. There was a definite pattern formed when graphing the data, as resolution decreased so did accuracy. These results suggest that the spatial resolution of DEMs directly influences the accuracy of landslide detection.

Figure 1. Vicinity map of study area in Carlyon Beach,
Washington. (Tran et al., 2019) The study area is the Carlyon Beach/Hunter Point landslide in northwestern Thurston County, Washington (Approx. Latitude: N 47° 10' 46", Longitude: W 122° 56' 24"), see Figure 1. Carlyon Beach sits at an elevation of 165 ft above mean sea level and is lightly covered in vegetation that consists of sub-mature, second growth coniferous trees, deciduous trees, blackberries, salmonberry, and sword fern. These plants play an important role in the ground movement of the area. The site has slopes that are between 7 and 20 degrees (GeoEngineers, 1999).

Study Area Background
Soil boring was done at the location to determine soil composition. Boring Logs show that the soil consisted of soft silt, stiff silt, and clay, all of which contribute to the soil's unstable nature. The soil is severely affected by disturbances such as weather, construction, and the natural environment. The soil's instability is the main reason for the landslide found on site. The landslide is located along the northern end of the Steamboat Island Peninsula. It also includes parts of the private community of Carlyon Beach, along with several rural residential dwellings along Northwest Hunter Point Road. Ground movement was first noticed as cracks and settlement in streets and driveways in 1999. The slide area extends from the existing shoreline to the upland of the peninsula ranging between 700 and 900 feet in width .
Another contributing factor to the constant failures is the shallow ground water table. The reactivation of the ancient landslide is partly due to the increase in general ground water (GeoEngineers Phase II, 1999). As the seaside carries sediment away from the toe of the landslide it reduces the resisting forces; the additional groundwater adds to the driving forces of the landslide. When compared to 58 years of precipitation data, the Carlyon Beach Peninsula experienced above average rainfall in the last five years (1993)(1994)(1995)(1996)(1997)(1998)(1999) of the study, between 3 and 65 percent (GeoEngineers Phase II, 1999). The site is roughly 26% deep-seated landslides (Booth et al., 2009). The rate of movement does not seem to be a direct threat to life but has caused significant damage to structures and infrastructure.

Data Acquisition
The LiDAR data was collected by the Puget Sound LiDAR Consortium in 2002. The DEM is the 3D data of the study area, which is used in the research for analysis. Figure 2 shows the DEM as a raster image, and Figure 3 is the unstable terrain map. This DEM has been filtered to show the bare-earth terrain and has a spatial resolution of 6.00 ft. or 1.83 m. The unstable terrain map was compiled in 2008 by M. Polenz, of the Washington State Department of Natural Resources, using a combination of the DEMs along with aerial images and land-surveying data. The unstable terrain map shows both deep-seated as well as surficial landslides (Booth et al., 2009).  (Tran et al., 2019) 3. METHODOLOGY

Workflow
Prior studies have shown that landslide terrain displays larger surface topographic variations; Whereas low variations are more commonly found in stable terrain (McKean and Roering, 2004;Booth et al., 2009, Mora et al., 2015. A sliding window was applied to the LiDARderived DEM to extract surface topographic variation. This approach analyses geomorphological features in the terrain. A previous study found that several feature extractors tested, roughness, slope, local topographic range, and local topographic variability were the best for this study site (Tran et al., 2019). Therefore, those were applied in our study.
After initial processing, the data was then resampled by increments of 2, resulting in five sets of data with varying spatial resolution (6ft, 12ft, 24ft, 48ft, and 96ft). Next, the feature extractors were applied to each DEM, followed by the unsupervised classifiers, k-means and GMM clustering to recognize topographic patterns and characteristics. Classifiers categorized smooth stable terrain as non-landslide and rough surface areas as landslide prone areas. For this analysis all four feature extractors were deployed in conjunction so that the only variable factor is spatial resolution. Lastly, a confusion matrix is used to validate the classification results by comparing the results to the independently complied landslide terrain map. An outline of the process is shown in Figure 4.

K-means Clustering
This classification compares each feature to each other to determine which group it is closest to. The output is a map that shows which cluster each feature likely belongs to (Seber, 2008). The K-means function shown below studies the original data and separates them into classes based on similarities. Data between classes have similarities but are also clearly distinct from each other. The number of clusters is specified by the researcher; this study was conducted by testing two through five classes. Map Algebra is used to solve the following equation.
Where: Z is the output raster with new data ranges. X is the input raster.
Oldmin is the minimum value of the input raster.
Oldmax is the maximum value of the input raster.
Newmin is the desired minimum value for the output raster.
Newmax is the desired maximum value of the output raster.

Gaussian Mixture Model Clustering
The second statistical model used for data classification is GMM. GMM cluster data identifies the statistical probability of a point belonging to any class. It will continue to reassign classes until it has maximized the probability that relative to the data set, the hypothesis is true. GMM clustering is more adaptable to different data sets because of its hard or soft/fuzzy computing option.
Hard clustering is when a data point is associated with only one cluster. Soft/fuzzy clustering allows points to be assigned a score that indicates how strongly that data point is related to that cluster (McLachlan, 2000). Similar to Kmeans clustering, GMM was tested using two through five ̃ is the mean of the i rh vector component. ∑ is the covariance matrices of the i rh vector component.

Accuracy Assessment (Confusion Matrix)
The confusion matrix was utilized as an accuracy assessment method. Confusion matrices compare reality to results found through testing. There are four assessment methods used for comparison: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). True in this instance indicates that the algorithm has correctly identified the area, regardless of what it is. False, conversely, means an area was incorrectly identified by the algorithm. Positive stands for unstable terrain, the area is susceptible to landslides. Again, negative refers to the opposite, the area is stable and does not exhibit any landslide features. In addition, accuracy (AC) and precision (P) were also calculated from the data. Accuracy measures how often the algorithm can correctly identify the terrain, and precision is how replicable the results are.
TP and TN need to be maximized as they represent correctly identified landslide and non-landslide areas, respectively. Statistical type 1 error, FP, should be minimized but they do not have as severe consequences as type II errors. FP means that an area has been falsely identified as landslide susceptible terrain. This means that the area is safe, and unlikely to start moving, it was only falsely identified. Whereas FN, a type II error, means that an area has been falsely identified as non-slide terrain when it is a slide area. This becomes dangerous due to it overlooking the dangers of slide terrain by marking it as safe. Buildings built on falsely identified land are at a higher risk of severe damage due to unaccounted ground movement.
The confusion matrix is used to compare the percentage of matching terrain between the algorithm mapped areas and inventory map. K-means and GMM clustering methods provide clustered results for the algorithm map. Since each feature extractor was tested by applying several classes, resulting classes needed to be manually assigned landslide or non-landslide.

RESULTS AND DISCUSSION
All geomorphological features extracted were tested individually in a prior study. This study tests all four features in conjunction with differing number of classes by applying K-means and GMM clustering. After clustering the data using a clustering method, each class must be manually separated into either a landslide or nonlandslide group. Figure 5 demonstrates the process, it begins with feature extraction (A), then clusterization with the specified number clusters (B), followed by separation of clusters into a landslide or non-landslide group (C), and lastly comparing the algorithm mapped landslide locations to the inventory mapped landslides (D). In Figure 5D the locations marked in the lighter grayscale are areas that were correctly mapped (TN and TP). Whereas the darker grayscale indicates areas that were incorrectly mapped (FP and FN). The confusion matrix for all combinations of spatial resolution and clusters evaluated through GMM clustering is detailed in Table 1 and those for the k-means are in Table 2. This process was repeated with each of the different spatial resolutions tested with two through five clusters using both GMM and K-means clustering methods.   The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-M-3-2023ASPRS 2023Annual Conference, 13-15 February & 12-15 June 2023 Tables 1 and 2 display the confusion matrix results from the spatial resolution tested with their respective classes. Table 1 shows the results obtained by applying the GMM clustering method and Table 2 are the results from Kmeans clustering. The two most important values to analyze from the confusion matrix are AC and FN. AC is the accuracy of analysis, how often were the algorithm mapped areas correctly identified. In other words, a higher AC value corresponds with more of the algorithm map matching the landslide inventory map. On the flip side, FN values should be minimized as it represents the percentage of areas that have been incorrectly identified as non-slide terrain. This is dangerous because the algorithm deems a location safe (non-slide terrain) when it is not. TP are opposite of FN, they represent areas that were correctly identified as landslides. The two values (TP and FN) are inversely related. If the percentage of landslides that were correctly mapped increases, then the percentage of landslides that were incorrectly identifies decreases by the same.
When interoperating the data, it is paramount to look at all aspects of the data. Looking at only one part can lead to incorrect conclusions and skew findings. For example, when looking at the data found by applying a GMM clustering method to a DEM with 6ft resolution and with two clusters, TP has a reading of 100%. The algorithm classified all landslide areas correctly, but this is only because nearly all the terrain was classified as part of the landslide group. We know this by looking at the TN value, this was incredibly low, indicating that very little of the non-slide terrain was correctly identified. AC was also used to come to this conclusion. For this study, the values that were analyzed were AC, because the goal of this study is to find the effects of resolution on accuracy.
GMM clustering was performed to gather the data found in Table 1. The highest accuracy readings from each 6ft, 12ft, 24ft, 48ft, and 96ft, are 85.86%, 86.35%, 85.42%, 81.62%, and 79.81% respectively. While there is not a lot of difference between the accuracies, it is notable that the overall accuracy decreases as the resolution decreases. There was one exception between the 6ft spatial resolution DEM and 12ft EDM where accuracy does increase; however, when looking at the corresponding number of clusters, this increase is acceptable. It may be since the resolution is still good enough at 12ft., and that the difference is negligible. Regardless of resolution or clustering method the data suggests that having a higher number of clusters increases accuracy. In every case, when data was clustered into either 4 or 5 clusters the accuracy was higher than those in 2 or 3 clusters. A prior study found that the effects of having more than 5 clusters were marginal and thus were not considered in this study. Having only one cluster does not separate the data into groups, it all gets placed into that one cluster.

CONCLUSION
3D information, such as surface data, was utilized to categorize landslides. Following an initial investigation into the feasibility of algorithmically mapping features of landslide surfaces, the subsequent step involved assessing the influence of spatial resolution. The original high-detail dataset, which had a spatial resolution of 6 feet, was down sampled to 12, 24, 48, and 96 feet. Afterward, an unsupervised classification was carried out using all four surface attributes (local topographic range, local topographic variability, slope, and roughness). Initial results indicate that as the spatial resolution decreased, the classification accuracy also decreased. A reduction in accuracy of up to 15% was observed in this study. Consequently, it was determined that the spatial resolution of 3D data like Digital Elevation Models (DEMs) indeed affects the precision of landslide classification.
Landslides occurring in the Carlyon Beach Peninsula have had significant impacts on human life, infrastructure, and the economy. To address this, advances have been made in landslide mapping technology over the past few decades to mitigate the repercussions of these geological events. Conventional mapping approaches like analyzing aerial photographs and conducting field inspections are still used worldwide to identify areas prone to landslides. However, these methods are time-intensive, expensive, and often unable to detect small-scale failures. In contrast, contemporary technologies utilize DEMs and automated algorithms, which can accurately identify landslide and non-landslide areas in a cost-effective manner. The techniques involve analyzing the DEM of the study region to differentiate stable terrain with smooth features from landslide-prone areas with rugged surfaces. Classification outcomes from k-means and Gaussian Mixture Model (GMM) clustering can achieve accuracy rates of up to 87% when compared to existing landslide inventory maps. Consequently, modern methods can swiftly identify landslide terrain using DEMs in a budget-friendly and accessible way. Future recommendations encompass expanding the dataset size and combining multiple feature extraction methods to enhance classification outcomes. This approach could potentially lead to improved results and the applicability of these techniques in other geographical areas.