A NEW APPROACH FOR MAPPING LAND USE / LAND COVER USING GOOGLE EARTH ENGINE: A COMPARISON OF COMPOSITION IMAGES

: In view of the increase in human activities, climate change and related hazards, land use and land cover (LULC) mapping is becoming a fundamental part of the process of any development or hazard prevention project. From this perspective, we propose a new approach for mapping LULC using Machine learning algorithms by comparing the result of five composition methods based on Google Earth Engine in the city of Tetouan - Morocco. To achieve this goal, considering the Sentinel S2 L2 imageries as a source data , five datasets were derived to make the classification generating by aggregating functions (median , mean , max , min and mode). Then based on the very high resolution (VHR) satellite images provided by Google Earth comes the next step that involves selecting samples that are divided into five classes (barren land, water surface, vegetation, forest, and urban areas), which will be further split into two parts: 70% as a training data -used to feed the machine learning algorithms (support vector machine (SVM), random forest (RF) and classification and regression trees (CART))- and 30% as a testing data for evaluating the models using accuracy assessments. The results for all datasets indicate that the SVM algorithm has the highest accuracy and its performance is better than the other algorithms (RF and CART). The average overall accuracy of SVM , RF , and CART was 87.99% , 87.81% and 84.72% , respectively. Furthermore, for each algorithm, the comparison between the results of the different composites indicates that the use of the mean composite is the most suitable for LULC mapping. Finally, GEE has proven to be an effective and rapid method for LULC mapping, especially with the use of compositional imagery that can assist decision makers in future planning or risk prevention.


INTRODUCATION
Today, there is a need to describe what is on the earth's surface at a given location (Zheng et al., 2021), especially with the world's rapidly changing land surface, which has made land use and land cover (LU/LC) an essential tool for land management and planning (Mallupattu & Sreenivasula Reddy, 2013).Furthermore, with global warming and climate change, land use and land cover is becoming a critical factor influencing many hazards (Shankar Prasad & Wasini Pandey, n.d.) such as floods (Zope et al., 2017) , flash floods (Sellami et al., 2022) , landslides (Shu et al., 2019) , debris-flow (Abuzied & Pradhan, 2020) .For this reason, and because of the speed of this increase and the unexpectedness of these hazards, land use and cover mapping must keep pace with this change and be faster and more accurate throughout the year.Historically, the first use of LULC was in 1950s manifested with the green revolution (Alshari & Gawali, 2021) , then in the 1960s when modern economic development plans were developed and implemented (Alshari & Gawali, 2021).After, in the 1970s, supervised and unsupervised techniques appeared, namely decision trees, neural networks for supervised classification and mixed modelling and vagal taxonomy for unsupervised classification (Vali et al., 2020).In recent years, object-based classification and object-based image analysis emerged as a technique for processing digital images using Artificial intelligence and deep learning (Alshari & Gawali, 2021).Otherwise , remotely sensed imagery is considered the most widely used data source for LULC mapping and land change monitoring (Roy et al., 2014) , in particular Landsat images, which are the data most commonly used data (Wulder et al., 2016) since it is the only system available that offers images with a resolution of 30 m, a temporal resolution of 16 days and an availability of 30 years (Noi Phan et al., 2020) .There are also other data sources for LULC studies notably: Sentinel 2 (Thanh Noi & Kappas, 2017), Moderate Resolution Imaging Spectroradiometer (MODIS) (Wan et al., 2015) (Xin et al., 2013), Satellite for Earth Observation (SPOT) (Disperati & Virdis, 2015), Synthetic Aperture Radar (SAR) (Reiche et al., 2015) .The problem with remotely sensed imagery is the large volume of data when mapping large areas or tracking LULC changes at different dates (Wan et al., 2015), which makes a user's personal machine inefficient to do this task, and so expensive in terms of storage capacity and time.Here comes the role of cloud computing platforms as FORCE (Framework for Operational Radiometric Correction for Environmental monitoring) and GEE (Google earth engine).FORCE is considered a good processing engine for medium resolution Earth observation image archives, and it is also an ALL-IN-ONE solution for processing large area and time series data (Frantz, 2019).The Google Earth engine is a computing platform capable of archiving a petabyte of geospatial data, freely accessible and efficient for visualization and analysis (Mutanga & Kumar, 2019) .What makes the Google Earth engine more flexible is the fact that it allows to avoid downloading and storing images and having access to more computing power to analyse and process the images on the platform itself, without downloading any local software (Nasiri et al., 2022) .As a result, many studies have recently used the power of Google Earth Engine to map the LULC instead of using conventional methods for example, but not limited to, the study by (Liu et al., 2020) that exploited time series data available on GEE by using a random forest algorithm to map LULC change in Ganan Prefecture.Another study in Northern Iran (Feizizadeh et al., 2021) used machine learning algorithms namely support vector machine (SVM), random forest (RF) and classification and regression tree (CART) for mapping LULC based on Landsat satellite image time series of GEE , with highest SVM accuracy of 90.25%.In Munneru river basin in india a classification of LULC was carried out by (Loukika et al., 2021a) using SVM, RF and CART based on a median composite imagery, and the performance of the three algorithms was compared, with the conclusion that SVM is the best.The last three years have seen a significant increase in studies using GEE in mapping and monitoring LULC with the majority of them comparing the performance of classifiers, but very few studies concentrating on comparing results from the same data set but with different image composition as (Nasiri et al., 2022) , (Praticò et al., 2021), (Noi Phan et al., 2020) .That's why the aim of this study is to propose a new approach for mapping land use and land cover change in Tetouan , Morocco using support vector machine (SVM), random forest (RF) and classification and regression trees (CART) by comparing five datasets based on Sentinel S2 L2 imageries : Max composite, Mean composite , Median composite , Min composite and Mode composite .

Study Area
The city of Tetouan is located in northern Morocco, in the region of Tangier-Tetouan-Al Hoceima, bounded to the north by the province of Fahs Anjra and the province of Mdiq Findeq, to the west by the prefecture of Tangier Assilah, by the province of Chefchaouen to the east and by the province of Larache to the south (Figure 1).It is also about 10 km from the Mediterranean Sea .The city is surrounded by two mountains, Dersa to the north and Ghorghiz to the south, and Oued Martil (river) runs through it .The population is 578,283, while in the countryside is 157,684, with an estimated area of 2,541 km2 (Monograph-HCP, 2020) .It has a Mediterranean climate with varying degrees of temperature (el Fatni et al., 2014).The city of Tetouan belongs to the Rif chain in its outer part which includes allochthonous formations (flysch nappes), pelitic-sandstone on a unit native to para-autochthonous to dominantly pelitic (Maftahi et al., 2020) .The highest altitude is 380 m (1247 ft) and the lowest -2 m (-7 ft).

Method
The methodology of this project started with the selection of a collection of S2 L2 sentinel images from June to July 2022, it is submitted to a cloud masking operation, and then five datasets are generated in order to be used for training the machine learning model.The latter is also fed with training data that were selected from a high-resolution imagery, subsequently the machine learning model was evaluated by combining the result with a testing data using accuracy assessments.and finally the result was LULC maps.The detailed methodology, which is entirely based on the GEE environment, is presented in Figure 2.

Satellite Data
In order to meet the objective of this study, The Sentinel-2 MSI: Level-2A was the source data for the LULC mapping.The level 2A is the result of an orthorectification offering the bottom of atmosphere (BOA) reflectance (SENTINEL-2 User Handbook 2013).S2 L2 provides data with high temporal resolution and wide spectral range, as indicated by (

Composite Images
The GEE cloud computing platform (https://earthengine.google.com;accessed on July 20, 2022) was employed to create the individual composite images from 23 Sentinel S2L2 images taken between June and July 2022 over the study area.Owing to cloud cover, less than 10% of the datasets were selected.Spectral bands 2-8 and 11-12 of Sentinel-2 images were used in this study.Therefore, to produce a single image from the 23 images, 5 statistical functions were applied namely: Max, Mean , Median, Min and Mode.

Max composite:
A composite image (Figure 4-A) that has been generated by reducing a collection of images taking into account the maximum value of each pixel in the stack of all corresponding bands.((Google Developers, n.d.)

Mean composite
A composite image (Figure 4-B) obtained by reducing a collection of images using the average value of each pixel in the stack of all corresponding bands.((Google Developers, n.d.)

Median composite
In statistics, when a list of data is arranged in a given order, the central value is called the median.in GEE median consists in reducing a collection of images by computing the median of all the values of each pixel in the stack of all the corresponding bands (Figure 4-C) (Google Developers, n.d.).

Min composite:
A composite image (Figure 4-D) that has been generated by reducing a collection of images taking into account the minimum value of each pixel the stack of all corresponding bands.((Google Developers, n.d.)

Mode composite:
The most commonly obtained result in a data set is called the mode.A mode composite (Figure 4-E) is generated by reducing a collection of images by computing the most frequent value at each pixel through the stack of all corresponding bands

Training and Validation Sample Data
Five classes represent the dominant LULC in the study area : Water (W) , Barre Land (BL) , Grass Land (GL) , Forest (F) and Urban (U) .A number of 538 land polygon samples (6148 pixels) (

Machine learning algorithms
Machine learning algorithms offered by GEE, such as SVM, RF and CART were exploited to train the classifiers for the Sentinel-2 composite images.

Support Vector Machine (SVM)
The support vector machine (SVM) is a type of supervised learning algorithm originally designed to solve binary classification problems.The method is essentially based on the principle of structural risk minimization.(SRM), which separates the hyperplane and the data points closest to the spectral angle (SAM) from the hyperplane mapper (SAM) (Talukdar et al., 2020).

Random Forest Classifier (RF)
Random Forest is a Machine Learning based method that was proposed by Breiman in 2001 (Feng et al., 2015).It can be considered as a set of several decision trees; it is defined by equation ( 1).
(1) where h = Random Forest x = input variables θk = independently identically distributed random predictor variables which are used to split each decision tree

Classification and Regression Tree (CART)
CART is the same as RF, but uses a single decision tree (Praticò et al., 2021).It works recursively by splitting nodes until it reaches the terminal nodes, based on a predefined threshold (Loukika et al., 2021b).

LULC Maps and Classification Performances
This study examines the performance of machine learning algorithms offered by GEE in LULC mapping by comparing five composites of Sentinel S2 L2 imagery using certain statistical functions, namely: Max, Mean, Median, Min and Mode.This is exactly what is presented in Figure 4, for each composite, three classifications were generated using SVM, RF and CART (Figure 4, 1 ->15).The comparison between the performance of the three previous algorithms, shows that SVM outperforms RF and CART with an average Overall Accuracy (aOA) of 87.99% and an average Kappa (aK) of 83.63%, while RF( aOA : 87.81 , aK : 82.60) and CART ( aOA : 84.72 , aK : 79.84) (Figure 3).Therefore, in terms of composite images, the highest average overall accuracy and average Kappa coefficient were obtained by the mean composite (aOA: 92.89% , aK: 90.59), followed by the median composite (aOA: 90.14% , aK: 86.24), the min composite (aOA: 86.84% , aK: 81.44), the max composite (aOA: 85.97% , aK: 82.74), and the mode composite (aOA: 78.35% , aK: 69.12) (Figure 3) .

DISCUSSION
Knowing that the majority of researchers prefer to use Landsat images in their studies (more than 784 studies until 2020 according to the review of ....).The current study uses Sentinel-2 images to produce the different composites.This choice is due to the morphology of the study area which requires a good spatial resolution (10m for visible and near infrared in Sentinel-2 instead of 30 in Landsat).Furthermore, the study shows that SVM is the best performing algorithm for LULC mapping compared to RF and CART, which is confirmed by many studies such as (Feizadeh et al., 2021) and (Loukika et al., 2021c), while the study of (Loukika et al., 2021c) found that RF is the best.In addition, some studies prefer to use the RF algorithm without comparing it to other algorithms, such as the study by (Nasiri et al., 2022) and (Noi Phan et al., 2020).In addition, for each composite image, three different classifiers (RF, SVM and CART) were tested to evaluate their performance.Although previous studies have discussed the image reduction process, they point out that the median is the most used (Kollert et al., 2021) (Noi Phan et al., 2020).In our case, after testing different composites with different metrics, we obtained the best results using the mean composite.The worst composite in term of accuracy was the Mode (Figure 3) and this can be explained by Table 3, when it is clear that the problem was in the two classes of barre land and grass land, perhaps the challenge comes back to the training data, in other words, some pixels were chosen as barre land but the most frequent pixels generated by the mode reducer were grass land.Thus, to improve the previous result, the choice of seasonal periods can be useful (Praticò et al., 2021) .

CONCLUSION
The main objective of this work was to propose a new approach for the mapping of the LULC in the city of Tetouan -Morocco through the comparison of many composite images and using several machine learning algorithms in Google Earth engine platform.It is evident at the end that the objective of this study has been successfully achieved.The use of the Google Earth platform was necessary because without it, the study would not have succeeded, and it would certainly have been expansive in terms of memory (storage of images) and time (data processing).It should be mentioned that the only limitations of the uses of GEE remain the need for a good internet connection to speed up the execution of operations.On the other hand, the study concludes that the mean composite is a good solution to reduce a collection of images, but the problem is that there are still many studies that directly use the median composite without testing the performance of the other composites and this goes exactly against our finding and that of (Praticò et al., 2021)(the only study in the literature that compares the different composites with the conclusion that the mean is the best).So, many studies should be done in the same sense.

Figure 1 :
Figure 1 : Geographical location of the study area

Figure 3 :
Figure 3 : The overall accuracy and The Kapp index (%) of classification results for different datasets based on machine learning classifiers.

Figure 4 :
Figure 4 :LULC maps of different composite images using SVM, RF and CART classifiers

Table 2 :
(Noi Phan et al., 2020)(de Sousa et al., 2020)bution in LULC classes.It should be noted that all samples were derived from visual interpretation of high-resolution Google Earth images.This method is commonly applied and mentioned in the literature(Noi Phan et al., 2020)(de Sousa et al., 2020).The training data, representing 70% of the samples, were fed into the machine learning algorithms to create the classification of LULC, while 30% of the samples, representing the validation data, were used to evaluate the performance of the classification.LULC characteristics (pixel resolution = 10 m)

Max composite Mean composite Median composite Min composite
SVM RF CART SVM RF CART SVM RF CART SVM RF CART SVM RF CART