Urban lakes change extraction using time series GaoFen-1 satellite imagery

: Urban lakes serve an indispensable role in maintaining the ecological balance of cities, ensuring flood safety, and providing recreational spaces for tourism. With the development of human activities and economic, the extent of urban lakes are inevitably influenced. Currently, the ability to detect detailed temporal changes in urban lake areas using high resolution data still has limitations. This study proposed a novel method by combining time series Gaofen-1 (GF-1) remote sensing data and random forest machine learning algorithm to explore the urban lakes change Zhushan Lake located in Wuhan. The research conducted the extraction of surface water for Zhushan Lake and its surrounding pit-ponds from 2013 to 2020. And then, a quantitative analysis of the characteristics and driving factors of lake changes is conducted. We find that (1) the accuracy of surface water extraction using the random forest classification method consistently exceeded 96%. The Kappa coefficient ranges from a minimum of 0.86 to a maximum of 0.99. (2) A noticeable decline was observed in the water areas of Zhushan Lake and its surrounding pit-ponds, predominantly along the northwestern shoreline and in the eastern pond regions. This decline is primarily attributed to pressures from building construction. The methodology proposed in this study is suitable for the area management of lakes in urban areas.


INTRODUCTION
Urban lakes play a pivotal role in the regional natural and human environments, which only supply water for living and production but also exhibit multiple functions, including flood control (Hayashi et al., 2008), urban heat island regulation (Yang et al., 2015), water storage (Song et al., 2013), tourism (Gossling et al., 2012), and climate regulation (Yan et al., 2013).These functions offer significant impetus for the sustainable development of cities.However, urban lakes have faced severe threats due to the urbanization in recent years (Zhang et al., 2018), such as water pollution (Brock et al., 2006) and Lake area shrinks (Kai et al., 2010).Therefore, investigating the changes of urban lake area is of great significance for urban environment sustainable development.
Objectively and accurately observing and understanding the changes in the area of urban lakes is a key indicator of urban land use mapping and ecosystem assessment.Remote sensing, with its high spatial and temporal resolution, wide coverage, long time-series data, diversity, and real-time observation capabilities, has significant advantages in observing changes in urban lakes, making it a crucial tool for exploring the change pattern of urban lakes.
The surface water extent of urban lakes is an important indicator to measure change pattern.Previous study have developed different water extraction methods (He et al., 2018).These methods are mainly categorized into two types: threshold segmentation methods and classifier model methods.The threshold segmentation method can be further divided into single-band method, spectral relationship method, and water body index method.For example, Liu employed the single-band method and multi-source remote sensing data to extract the water area of Lake Chad in Africa (Liu et al., 2013); Jiang combined the spectral features of vegetation red edge and shortwave infrared bands in Sentinel-2 satellite imagery, proposing a new water index termed surface water index (SWI) (Jiang et al., 2021);Zhang employed Landsat imagery and Gaofen data to extract and analyze the spatiotemporal variations in the surface area of Poyang Lake using the surface water index method.(Zhang et al., 2019).
The threshold method requires the manual setting of segmentation thresholds according to prior knowledge, which can cause errors when extracting water bodies in a complex urban environment area (Zhou et al., 2020).The classifier model method treats the water body as a category of land cover and applies classifier algorithm rules to extract water body from remote sensing imagery.The popular methods include objectoriented, decision tree, support vector machine, neural network and random forest (Jiang et al., 2018).For example, Shen's study proposed a mountainous water body decision tree extraction model based on GF-5 image data and extracted the water body in the Dongchuan District of Kunming City (Shen et al., 2021).Dong's research employed Landsat 7 ETM data and a support vector machine to determine the surface water area of Guiyang City, then analyzed its spatiotemporal change characteristics and driving factors (Dong et al., 2022).A multilayer perceptron (MLP) neural network was proposed to identify surface water with Landsat 8 satellite images, which was compared the extraction accuracy with water indices and support vector machines.The results found that the MLP method can achieve better performance compared with water indices and support vector machine (Jiang et al., 2018).
Random forest is a widely used classification algorithm for remote sensing imagery.Previous studies indicate that random forest exhibits significant advantages in classification accuracy, training time, and classifier stability across varying training samples and study areas (Pal et al., 2010).Currently, the extraction of detailed temporal changes in urban lakes is challenging for high spatial resolution remote sensing imagery due to the small size of urban lakes and the complexity of the urban environment.To explore the change pattern of urban lakes using remote sensing data, this study aims to propose a new method by combining time series Gaofen-1 satellite remote sensing data from 2013 to 2020 and random forest machine learning algorithm.And then, the Zhushan Lake, located in Wuhan city is taken as an experimental area to conduct the quantitative analysis of the water areas change pattern.

Study Area
Wuhan is characterized by its intricate network of rivers, ports, and a myriad of lakes and ponds, earning it the moniker "City of Hundred Lakes" with a record of having 166 lakes.One such lake is Zhushan Lake (Figure 1), which is located in the eastern part of Caidian District, and is among the 43 urban lakes in Wuhan.Zhushan Lake, a crucial water source for Wuhan, not only supplies drinking water to its residents but also plays an indispensable role in agricultural irrigation and in industrial water usage.Moreover, the lake holds significant ecological and tourist value.The surrounding ecosystem offers recreational and natural sightseeing opportunities for local residents, thereby contributing to the city's ecological balance.However, in recent years, the rapid urbanization process and human activity have impacted the extent of Zhushan Lake.
Figure 1.The location of study area

Data collection and preprocessing
This study collected time series Gaofen-1(GF-1) remote sensing data from 2013 to 2020 (Table 1).To ensure the comparability of lake extraction results across different periods, remote sensing images were preferably selected during the wet season (May to September), with the temporal phase of the remote sensing data kept as consistent as possible.For certain data sets, cloud cover can impact the precision of water body information extraction, which prompted the selection of data from adjacent months.Furthermore, the remote sensing image data underwent preprocessing steps, which included geometric correction, image fusion and image clip, thus achieving an experiment data with 2m spatial resolution.

Year
Sensor

Research method flow
The flowchart of this study is shown in Figure 2, this flowchart contains four steps.The first part involves the collection of time series GF-1 image data and the subsequent preprocessing of these data.In the second part, training samples and feature bands are selected to construct a random forest model for extracting of lake water areas.The third part is dedicated to the verification of the accuracy of the lake water area extraction results.In the final part, indicators for evaluating lake changes are proposed to explore the change pattern and factors of urban lakes.

Random Forest Classification Algorithm
Random forest is an ensemble learning method, composed of multiple decision trees.Each decision tree is trained on a random subset to achieve optimal classification results (Amit et al., 2001).The classification principles are as follows: For the training sample set , which contains n samples, each sample has  features.Each sample corresponds to a target set  ,which is divided into  classes.In the random forest classification algorithm, it is usually divided into two classes.Therefore, we have the training sample set  = { 1 ,  2 … ,   }, and the target set  = { 1 ,  2 , … ,   }.From the training sample set ,  ' training samples are randomly drawn using the bootstrap sampling method (where  ' < ) to form a new training subset.For each new training subset, m features are randomly selected ( < ), and the optimal feature is selected from these m features for splitting.The mth feature x m in the training subset  is sorted in ascending order, denoted as { 1 ,  2 , … ,   }, to obtain the set T of m − 1 split points of the feature   , where { 1 ,  2 , … ,   } where   =  th training sample in ascending order feature value   =  th split point of the feature   Select the m th feature   and the i th split point   , resulting in two feature space subsets in the training subset  that are divided: Traverse all the features in the training subset , and use the calculation of the Gini coefficient to find the optimal split feature   and the optimal split point T to construct the decision tree: (3) (4) where | 1 |, | 2 |= number of samples corresponding to the feature space subsets  1 and  2 .
||= number of samples in the training sample subset .
= proportion of samples in the dataset that belong to category .
= subset of samples in the training subset  that belong to the  class.
Repeat the above steps until the predetermined number of decision trees are constructed.When the numerous decision trees of the random forest are constructed, the input to be classified is given, and all the decision trees in the random forest will give a classification result (water body and non-water body) respectively, and vote.The classification result with the most votes will be output as the final classification result

Accuracy Assessment
In order to validate the precision of the water body extraction performed by the random forest classifier, this study employed a confusion matrix to compute accuracy evaluation indicators.These indicators included the Overall Accuracy (OA) and the Kappa Coefficient are served to assess the accuracy of the extracted water body result.
(1) Overall accuracy (OA) where  = number of pixels that are actually water body pixels and are also detected as water body pixels.
= number of pixels that are actually non-water body pixels and are also detected as non-water body pixels.
= number of pixels that are actually non-water body pixels but are detected as water body pixels.
= number of pixels that are actually water body pixels but are detected as non-water body pixels.
(2) Kappa coefficient where   = overall accuracy   = total number of samples This study annually selected 500 random sample pixels from the GF-1 imagery from 2013 to 2020.Then, the confusion matrix was established by manual visual interpretation to compute accuracy evaluation indicators.

Urban lake change index
Four key indicators were selected including lake area, shoreline length, lake shape index and lake fragmentation (Xie et al., 2018).These indicators were utilized to quantitatively analyze the change pattern in this study.
(1) Lake Shape Index (LSI) where  = shoreline length  = lake area The Lake shape index serves as a quantitative measure for evaluating the influence of human activities on lake landscapes.A lower LSI value signifies a lake with a relatively simplistic geometric configuration, indicating a heightened susceptibility to the impacts of external activities.Conversely, a higher LSI value represents a lake with a more complex geometric configuration, suggesting a reduced level of human interference.
(2) Lake Fragmentation (LF) where  = Lake number Lake fragmentation quantifies the degree of fragmentation in a lake, encapsulating the complexity of the lake's spatial structure.

Extraction result and accuracy assessment
Figure 3 shows the results of lake surface water extraction performance using the random forest classifier and time series GF-1 imagery.Figure 4 employed manual visual interpretation of samples to compute the Overall Accuracy (OA) and Kappa Coefficient.These indictors served to validate the precision of the water body extraction conducted by the random forest algorithm.
Figure 4 reveal that the accuracy of water body extraction using the random forest classification method consistently exceeded 96%.The Kappa coefficient ranged from a minimum of 0.86 to a maximum of 0.99.A comprehensive assessment of these two accuracy evaluation indictors indicated that the random forest method demonstrates good performance in water body extraction of urban lakes.

Urban lake change analysis
Table 2 presents the values of four indicators of changeshoreline length, lake area, lake shape index, and lake fragmentation-for Zhushan Lake from 2013 to 2020.These results reveal a noticeable decline in both shoreline length and lake area since 2013.By 2020, the water area of Zhushan Lake and its surrounding pit-ponds has decreased by 1.11 km ² compared to 2013, and the shoreline length of the ponds had shortened by 7.63 km.This corresponds to a shrinkage rate of 28.73% for the water area and a shortening rate of 16.4% for the shoreline.The lake shape index is higher in 2020 than that of in 2013, suggesting that the lake was more severely impacted by external activities in 2020.Conversely, the lake fragmentation was greater in 2013 than in 2020, indicating a relatively better biodiversity in 2013.
Observations of water extraction results from 2013 to 2020 reveal a significant reduction in the area of lakes along the northwestern shores and ponds in the eastern part of the study area.In the northwestern region, lake infilling began from 2014.Starting in 2018, construction of a factory commenced, and by 2019, the construction of this factory was completed.In the eastern region, infilling of pit-ponds started from 2013.From 2015 to 2018, construction began on the infilled the lake area, and this project was essentially completed until 2018.Consequently, it can be concluded that the water area changes of Zhushan Lake and its surrounding pit-ponds are primarily due to anthropogenic infilling of lakes and ponds.The infilled areas are mainly used for building construction.

CONCLUSION
This study, focusing on Zhushan Lake and the surrounding pit-ponds in Wuhan City, utilized time series GF-1 remote sensing imagery and random forest algorithm to extract water bodies of Zhushan Lake.The following conclusions are obtained: (1) The application of the random forest algorithm for water body extraction achieved high extraction precision.The OA exceeded 96% and the overall Kappa coefficient was 0.95.
(2) From 2013 to 2020, the water body area of Zhushan Lake and its surrounding ponds decrease by 1.11 km ² , and the shoreline length reduced by 7.63 km.The shrinkage rate of the water body area is 28.73%, and the shortening rate of the pond shoreline is 16.4%.
(3) Through the study of key changing areas, it was found that Zhushan Lake and its surrounding pit-ponds are mainly impact by building construction.
These results of this study demonstrated the feasibility and accuracy of using the random forest algorithm for water body extraction, providing a methodological reference for exploring the factors influencing changes in river and lake water environments.

Figure 2 .
Figure 2. Flow chart of research method

Figure 4 .
Figure 4. Accuracy verification of surface water extraction

Table 1 .
The metadata of the time series GF-1 remote sensing data

Table 2 .
Urban lake change index