The time-series GF-1 WFV data monitoring of sugarcane using a Random Forest Algorithm in South China

There is large distribution of sugarcane growth in south China which is play an important role of sugar industry. Remote sensing technology is used in sugarcane monitoring for large areas. However, the optical satellite data coverage is influenced by the rainy weather especially in the grand growth period of sugarcane. GF-1 WFV has widely swath 800km and short revisit time which is ideal data for this study area. In this paper, the random forest model was chosen to get a precise classification result of sugarcane based on time-series band value and 5 spectral indexes image is 89.73% and the Kappa coefficient is 0.65 which is satisfied the overall extraction of sugarcane for large area is the southern China. Furthermore, the decision tree classification was chosen as a comparative experience research.


Introduction
Sugarcane is an important sugar crop as well as an important bioenergy crop, which is widely planted in southern regions such as Guangxi, Yunnan, Guangdong, and Hainan in China.Monitoring sugarcane cultivation is extremely important for the estimation of sugarcane yield and the formulation of policies and production plans related to the development of the sugar industry by relevant departments (Chen et al., 2022;Linjiang et al., 2020).Remote sensing technology has been widely used in the field of agriculture, and results have been achieved in monitoring the growth of crops on a large scale (Abdel -Rahman and Ahmed, 2008;Aguiar et al., 2011;El Hajj et al., 2009;Gers;Mulianga et al., 2013;Rudorff et al., 2009;Xavier et al., 2006).
In recent years, with the development of satellite technology, the long temporal satellite images are suitable with high resolution for agriculture monitoring.Timely spatial and temporal satellites image data are widely used in sugarcane planting and growth monitoring.Mulianga forecasted sugarcane yeild using the time series the Moderate Resolution Imaging Spectro Radiometer(MODIS) data over nine years.(Mulianga et al., 2013) Landsat Thematic Mapper(TM) and Landsat Enhanced TM Plus(ETM+) time-series were used for analysis sugarcane in Brazil.(Vieira et al., 2012) He Yajuan et al. performed sugarcane leaf area index inversion and yield estimation based on SPOT 4 and SPOT 5 data.A significant positive correlation between sugarcane Leaf Area Index (LAI) and Normalized Difference Vegetation Index (NDVI) at different fertility stages was conclude (He et al., 2013).Zeng Zhikang et al. carried out a study on crop cultivation information extraction based on China high temporal and spatial resolution satellite images, and comprehensively applied ZY-3, HJ-1, and GF-1 thematic and spectral information to obtain the time series spectral features during the critical period of crop growth.The overall classification accuracy is 86.80%.(Zhi-Kang et al., 2017).Qin Zelin et al. used the 2 m resolution GF-1 image as a data source and extracted the crop information of sugarcane, rice and banana using the spectral, normalized vegetation index, gray value, and other features of the image, using an object-oriented classification method (Ze-Lin et al., 2017).
Sugarcane information extraction based on single time-range remote sensing images is easy to be confused with nonsugarcane crops, garden and grass cover, and the accuracy of the extraction results is low.However, based on time-series remote sensing images, it can excavate the seasonal period of sugarcane growth based on the growth pattern of sugarcane, and then differentiate it from other crop types to get the extraction of the planting range of sugarcane, which is a scientific and effective method.
The swath wide of high-resolution satellite is less than 20km such as Worldview, Pleiades and GeoEye.The distribution of sugarcane is in southern China with cloudy weather.Highresolution data is not suited the time-series monitoring in China.GF-1 was successfully launched in 2013 with two 2m resolution panchromatic/8m resolution multispectral cameras and four 16m resolution wide-swath cameras which is 800 km ideal data for time-series coverage in sugarcane growth area of China.
In this study, based on the GF-1 16m wide-swath cameras the range of sugarcane cultivation is extracted as an example to explore feasible technical methods to provide methodological references as well as other crops monitoring.

Study Area and Data
Part of the southern China was selected as the study area, covering an area of 114.50 km2, which is located within the national sugarcane production base.The GF-1 WFV time-series images of the study area were acquired, with five-time phases from April to October in 2015, respectively, containing four bands of R, G, B, and NIR, with a resolution of 16 m.GF-2 with high-resolution is chosen as the sample collection.

Date Processing
On the base of analysing and summarizing the existing crop monitoring methods, we take the extraction of sugarcane planting range as an example.In key growing period of sugarcane, the maximum difference index between sugarcane and other crop types is used to explore the applicable methods with strong operability which is the foundation for large-scale popularization and application.
Firstly, the time series remote sensing images of GF1 and highresolution remote sensing images of the same period was orthorectified to the World Geodetic Survey 1984(WGS84) datum.Sugarcane, non-sugarcane crops, garden land, forest and grass cover, bare ground, water area, buildings (structures) are collected on the high resolution remote sensing images as the samples for information extraction and analysis (Xulong et al., 2006).Finally, the Random Forest Model and the Decision Tree Model are chosen in this research.

NDVI characterization of samples
Spatial overlay analysis of the samples in surface cover types with the NDVI raster results was carried out, and raster statistics were performed to statistically determine the minimum, maximum, mean, and variance of the NDVI within different tapes of samples, obtain the NDVI of seven types samples of surface cover types.The statistical results are shown in Table 4.

Random Forest Model
Random Forest (RF) is an ensemble learning method with high classification accuracy which can determined variable importance.(Breiman et al., 2001) Tree type classifiers used for regression which are combined for the predicted result.The original training data are randomly split into large numbers of nodes which is determined by the user-defined input variables.
The accuracy of the classification is depended on the suitable split for a node.
RF showed the good performance on agriculture classification using satellite images in previous research which can estimate the relative importance of the input variables with large dataset and input variables.(Xulong et al., 2006;Ze-Lin et al., 2017) In this research, the Random Forest Model was used for sugarcane classification which is suited to a precise result with data in high dimension and outlier detection (Gislason et al., 2006).The variables are given in table with four bands (Red, Green, Blue and Near Red) and 5 spectral indexes (NDVI, NDWI, SAVI, RVI and NIRV).The number of sub-decision trees of the classification model is set to 300.The regression model is constructed based on the above preferred features with 80% of the samples.20% of the samples are used to verify the classification accuracy.

Random Forest Model Result
Using the existing results, the accuracy of the extraction results was evaluated using the confusion matrix and Kappa coefficient method, and the results are shown in Table 6.
Figure 5.The extracting result of sugarcane planting range.The overall accuracy was calculated to be 89.73% with a Kappa coefficient of 0.65.From the results of spatial overlay analysis and accuracy evaluation, the overall accuracy of the extraction results is high, which can meet the needs of mastering the overall situation of sugarcane planting in the large range.The important of the variables in the random forest model was calculated.The blue band in 3 rd of April, the NDVI in April19th and the red in 21 th of October were the most important.The comparative important variables were in April which is the germination period of sugarcane in the research area.

Compare With the Decision Tree Classification
In addition, one of the traditional methods of large area classification, the decision tree classification was chosen with the same sample as a comparative experience method in this research shown in Table 7.The overall accuracy is 85.64% with a Kappa coefficient of 0.65.Compared with the traditional method, the RF model shows better improvement in the overall accuracy as an over 2% raise than the decision tree.

Figure 3 .
Figure 3.The flow chat of this study.

Figure 4 .
Figure 4.The curve diagram of characteristic statistical value of sample's average NDVI Variables of the random forestBased on the random sampling strategy, 80% of the samples are used for the training of the random forest classification model, and the maximum depth of the sub-decision tree is set to 10.

Figure 6 .
Figure 6.The important of variables in RFM.

Table 3 .
(Liu et al., 2015)1 satellite dataThe Rational Polynomial Coefficient (RPC) model(Liu et al., 2015)and the Digital Elevation Model (DEM) data are used to orthorectify the remote sensing images of each time-phase.

Table 2 .
Parameter of GF2 satellite data

Table 4
. The characteristic statistical value of sample's average NDVI

Table 6 .
The accuracy evaluation of RFM.

Table 7 .
The accuracy evaluation of decision tree classification.The combination of high spatial resolution and high temporal resolution remote sensing images to carry out crop extraction was great potential for application.It can give full play to the advantages of high temporal resolution remote sensing images to carry out time-series remote sensing monitoring and master the key seasonal period of crop growth as the key method of crop extraction; and it can also utilize high spatial resolution remote sensing images to accurately interpret crop categories and verify the accuracy of crop extraction.In this study, GF-1 WFV time-series remote sensing images combined with highresolution remote sensing images is utilized which can satisfy the needs of crop extraction.The Random Forest model can give precise sugarcane classification in large area compared with the RF model.The satellite data in April are the important feature of the Random Forest model for sugarcane extraction in this paper.