Research on Sugarcane Recognition based on Joint Spatial and Spectral Information from Satellite-Based Hyperspectral Imagery

: The total amount of sugarcane planting in China ranks third in the world, and Guangxi is the largest sugarcane planting base in China. In order to help the Guangxi government and industry better understand the annual planting distribution and scale of sugarcane in the whole region, it is convenient to optimize the planting structure and improve the efficiency of resource utilization, so as to ensure the stable development of Guangxi sugar industry. Based on the satellite-based hyperspectral image data of ' Zhuhai No.1 ', this paper carries out sugarcane recognition and area extraction, which fills the application gap of satellite-based hyperspectral image data in extracting sugarcane area in China. A globally optimized extended random walk model was used to identify and extract sugarcane by combining the spatial-spectral information of hyperspectral imagery. Firstly, the Initial probability estimation is performed using the LOR classifier, and then use the data after MNF dimensionality reduction to construct a weighted graph, and finally the probability optimization and recognition are carried out based on the extended random walk model. The results show that the spatial-spectral joint method based on extended random walk can effectively identify sugarcane, and the sugarcane recognition accuracy of 2021 year image can reach 91.4 %, the misrecognition rate is 2.8%; the sugarcane recognition accuracy of 2020 year image can reach 90.5 %, the misrecognition rate is 3.1%.


Introduction
China is the third-largest sugarcane farming nation in the world, with Guangxi being the primary planting region.Information on the area and size of sugarcane planted in the same year, as well as production forecasts, are urgently needed by the Guangxi government, sugar industry associations, insurance companies, and other related businesses in order to plan sugarcane transportation routes scientifically, determine the beginning and end of the crushing season, and develop pricing strategies that will take market fluctuations and business benefits into account.To ensure the steady growth of Guangxi's sugar industry, it is advantageous to optimize the planting structure and enhance resource utilization efficiency.This will aid the government and industry in better understanding the distribution and scale of sugarcane planting throughout the region each year.Based on the aforementioned circumstances, this paper attempts to fill the gap in the domestic application of star-borne hyperspectral image data by using a global optimization extended random walk model to identify and extract sugarcane by combining the spatial and spectral information of hyperspectral images.It does this by using star-borne hyperspectral remote sensing images as the data source, applying hyperspectral data processing technology, and taking advantage of hyperspectral data, and concurrently to offer statistics to back up the province-wide sugarcane output estimate.In addition to facilitating the successful transformation of Guangxi's sugarcane industry, this will aid in the formulation of sugarcane and sugar policy, which is favorable to the macro-control of sugarcane and the protection of legislation pertaining to the sugar industry.
In terms of sugarcane planted area extraction, scholars have also carried out research on different methods.In 2024, Yang Ni et al. extracted sugarcane planted area in Chongzuo City, Guangxi based on Sentinel-2 images, and the overall classification accuracies of sugarcane planted areas were all higher than 91%, and the kappa coefficients were all greater than 0.88 (Yang Ni, 2024a).In 2022, Liang Jieyu et al. proposed a probabilistic information-based method (PIM) for sugarcane area extraction on Sentinel-2 image, the overall accuracy and F1 score were 0.98 and 0.97 respectively (Liang Jieyu, 2022a).In the same year, Shyamal S.Virnodkar performed sugarcane classification based on Sentinel-2 satellite data by CNN model, the accuracy was 88.46% (Shyamal S.Virnodkar, 2022a).In 2021, Xie Xinchang et al. applied 6-5-2 optimized band combination of LANDSAT8-OLI remote sensing image, introduced NDVI, DEM and other auxiliary identification feature variables, and used Random Forest Classification for sugarcane area identification, with overall classification accuracy higher than 92% and Kappa coefficients all greater than 0.8 (Xie Xinchang, 2021a).In the same year, Kavita Bhosle et al. used deep learning based on EO-1 Hyperion hyperspectral images to perform sugarcane, cotton and cotton classification by CNN model.CNN to classify sugarcane, cotton and mulberry trees with 99.33% classification accuracy (Kavita Bhosle, 2021a).It can be seen that different data sources, method models and the size of the study area will affect the extraction accuracy of sugarcane area, but it can basically meet the requirements of sugar enterprises for the extraction accuracy of sugarcane plantation area, and it can be popularized and applied in the sugarcane industry.
In summary, looking at the application of remote sensing technology in sugarcane area extraction, scholars have carried out a lot of research and achieved good results, with mature technology, which promotes the development of sugarcane industry.However, in the application of sugarcane area extraction based on remote sensing data, multispectral remote sensing data and near-view hyperspectral data are mainly used in China, and there is no research on remote sensing application of sugarcane based on satellite-based hyperspectral data.It can be seen that the application of satellite-based hyperspectral remote sensing technology in China's sugarcane industry is relatively small, and the application of sugarcane area extraction is even more uninvolved, which provides an unparalleled opportunity for the first time to study the extraction of sugarcane area using satellite-based hyperspectral remote sensing images.

Study Area
The study area of this paper is located in Dongluo Town and Liuqiao Town, Fusui County, Chongzuo City, Guangxi Autonomous Region, south of the Tropic of Cancer, between longitude 107°31′ -108°06′ E and latitude 22°17′ -22°57′ N (Figure 1 below).It belongs to the subtropical humid monsoon zone, with an average annual temperature of about 21.7°C, an average annual rainfall of about 1300 mm, and a flatland elevation of 100-200 meters.The total area of the study area is about 495 square kilometers, accounting for 17.6% of the total area of Fusui County, and sugarcane is the main cash crop in the area with a stable yield all year round.
Figure1.Area of study.

Data Sources
The hyperspectral image data used in this study are the image data of the Chinese commercial satellite "Zhuhai-1"."Zhuhai-1" is a commercial remote sensing satellite constellation launched and operated by Zhuhai Orbit Aerospace Science and Technology Company Limited, which is the first satellite constellation built and operated by a private listed company in China.The whole constellation consists of 34 satellites, including video satellites, hyperspectral satellites, radar satellites, high resolution optical satellites and infrared satellites.Among them, there are 8 OHS hyperspectral satellites: spatial resolution of 10m, altitude of 500 km above the ground, mass of 67kg, imaging range of 150 km ×2500 km , number of spectral bands of 32, spectral resolution of 2.5nm, spectral range of 400nm-1000nm, and operation orbit of 98°.This paper uses the hyperspectral image of the OHS-2C satellite, which has 32 bands and a spectral wavelength range of 464nm-946nm (Figure 2 below, form https://www.myorbita.net/).In this paper, OHS-2C hyperspectral images of the study area in 2020 and 2021 were acquired for this study.

Figure2.OHS-2C COMS1 spectral response curve
Sugarcane sample data and validation data use manually collected sugarcane map data from the "Digital Guangxi" project, which are accurate sugarcane planting maps obtained through various methods by various units under the Regional Department of Natural Resources.In addition, the research team also collected 38 sugarcane planting areas in the field, as shown in Figure 3

Hyperspectral Dimensionality Reduction
Due to the large number of hyperspectral image bands, most of the bands contain little mapping information and are prone to redundancy.Therefore, in this study, the OHS-2C hyperspectral remote sensing image data are subjected to the Minimum Noise Separation (MNF) transform, which reduces the dimensionality of the image, reduces the noise, and determines the effective number of dimensions (i.e., the number of bands) in the image data, thus greatly reducing the amount of computation in the processing of the data.The MNF is essentially a two-step cascaded principal component transform.The first transform is used to separate and recondition the noise in the data and remove the correlation bands; the second step is a standard principal component transform of Noise-whitened data.After the MNF transformation, the vast majority of the spatial and spectral information contained in the image data is mainly concentrated in the first few bands (Zhu Jinsan, 2019a).In this paper, the OHS-2C hyperspectral image is pre-processed and then MNF transformed, and the first band is taken as the first principal component, and it is represented as a weighted map form.
The mathematical derivation of the MNF transformation is as follows: Step 1: A high-pass filter template is used to filter the whole image or image data blocks with the same property to obtain the noise covariance matrix N C , which is diagonalised into the matrix N D , i.e.
Where N D =the diagonal matrix of the eigenvalues of N C in descending order  = an orthogonal matrix consisting of eigenvectors Further transformation of Eq. ( 1) leads to the following equation: (3) When Q is applied to image data X , the original image is projected into the new space by the Y QX  transform, which produces transformed data with noise having unit variance and no correlation between bands.
Step 2: A standard principal component transformation is applied to the noisy data.The formula is as follows: The transformation matrix of the MNF which is TMNF , is obtained by the above 2 steps.And TMNF QV  .

LOR Classifier
L0R classifier actually means Logistic Regression classifier.A logistic regression classifier is a statistical method used to solve binary classification problems by learning a 0/1 classification model from training data features.The core idea is to use a linear combination of sample features as the independent variable and map the independent variable onto the (0,1) interval using a logistic function to predict the probability that a sample belongs to a particular category, as shown in Figure 3 below.(1) First, it is known that M U is a collection of labelled pixels, i M u U  are defined to belong to a certain category k .
(1, , ) denotes the total number of different categories as K .
(2) Given the markers, the purpose of the random walk algorithm is to compute the probability ik P that a pedestrian starting from any unlabelled pixel i and starting a random walk will first reach any labelled pixel belonging to the k th class.
(3) ik P is obtained by solving the following energy function Where L denotes the Laplace matrix of the weighted map ( ) In other cases w e if i and j are In this matrix, ( ) refers to the degree of i th pixel, and the degree is obtained by summing the weights of all the routes connected to node i u .When a portion of the labelled pixels M U is known, the energy function shown in Eq. ( 6) has an analytical solution and can therefore be obtained by solving a system of linear equations.
Finally, for each pixel, the category with the highest probability is selected as the labelling of that pixel.Accurate supervised segmentation of the input image can be obtained based on the probability maximization criterion.

The Extended Random Walk:
The random walk segmentation algorithm is only applicable to supervised image segmentation, i.e., manual labelling is required (Xudong Kang, 2014).And when M U is an empty set, the energy function shown in Eq. ( 6) will not be solved, and the image segmentation framework based on random walk cannot effectively utilize the spectral information of hyperspectral images.Therefore, the experimental results of random walk segmentation of hyperspectral images using training samples directly will be excessively smooth.This problem can be solved by combining the spatial and spectral information of the image using the extended random walk algorithm.The extended random walk method defines a non-spatial energy function as follows: where n  is a diagonal matrix whose elements on the diagonal denote the initial random walk probability ik r of node i U .In this paper, we use LOR to estimate the initial probability that a hyperspectral image pixel belongs to a certain feature class and merge the spatial and non-spatial functions in the following way: where the  parameter controls the balance between the spatial and non-spatial terms.Similar to solving Eq. ( 6), the probability k P can also be obtained by solving a system of linear equations.Finally, similar to the random walk segmentation algorithm, for each pixel, the category with the highest probability is selected as the label for that pixel.The specific implementation process of this paper based on extended random walk optimization classification is as follows: Step 3: Based on the weighted graph and extended random walk theory, the spatial-spectral energy function shown in Eq. ( 9) is solved to obtain the optimized probability , i k P .
Step 4: To get the classification results based on the probability maximization criterion arg max ( , )

Results And Analysis
In this paper, the acquired 2020 and 2021 OHS-2C hyperspectral images were used to classify features through the global optimization method based on extended random walk, and the sugarcane classification results were extracted and displayed individually, and the sugarcane identification results for 2020 and 2021 in the study area were obtained (as follows).Among them, the area of sugarcane recognition in 2020 is 149.7 2 km , the recognition accuracy is 90.5%, and the misidentification rate is 3.1%; the area of sugarcane recognition in 2021 is 147.3 2 km , the recognition accuracy is 91.4%, and the misidentification rate is 2.8%.The correct rate and error rate of sugarcane identification are based on the comparison statistics with the sugarcane patches obtained from the digital Guangxi materials.The sugarcane identification accuracy rate and wrong identification rate were defined as the following equations: Where acc p = the correct recognition rate of sugarcane  1 below), and the total amount of 2021 is very close to that of the relevant reports from government departments.The actual planted area of sugarcane in 2020 was higher than that in 2021.In addition, the recognition accuracy of sugarcane in 2020 is lower than that in 2021, and the misrecognition rate of sugarcane is higher than that in 2021, which indicates that there is an effect of different time-phase images on the classification results.In addition, this paper likewise investigated the effect of different parameter settings on the sugarcane identification results.Taking 2021 as an example, it was found that different parameter combinations of  and  had a significant effect on the results of sugarcane area recognition performed on hyperspectral impact (as shown in the table below).In which when  =10 -5 and  =700, the accuracy of sugarcane recognition is 89.7% and the misrecognition rate is 3.8%; when  =10 -6 and  =700, the accuracy of sugarcane recognition is the lowest, which is 85.6%, and the misrecognition rate is the highest, which is 5.1%; whereas when  =10 -5 and  =750, the accuracy of sugarcane recognition is the highest, which is 91.4%, and the false recognition rate is lowest, which is 2.8%.The accuracy and false recognition rates were in the middle of the range for the other parameter configuration conditions.Therefore, when using the extended random walk model for image classification, the tuning of the parameter combinations of  and  is needed to achieve the best classification results.The reason for this is the difference in spectral resolution and spatial rate in addition to the difference in models.Nevertheless, the performance of feature classification in hyperspectral image data based on the extended random walk model is demonstrated, which will provide technical support for feature recognition based on hyperspectral images, especially in sugarcane recognition, and it can fully help the local government to understand the distribution and scale of sugarcane cultivation in the whole region every year, and provide a small help for the development of sugarcane industry in Guangxi.

Years
below.
Where adj D C  = the covariance matrix of image X D adj C  = the matrix transformed by Q Further diagonalising adj D C  into the matrix D adj D  , the formula is as follows: D  = the diagonal matrix of eigenvalues of adj D C  in descending order V = an orthogonal matrix consisting of eigenvectors

Figure4.
Figure4.Principle diagram of LOR Specifically, the logistic regression model solves for a set of weights that are obtained by linearly summing the features of the input samples, and then converted by a sigmoid function to a probability value, which indicates the probability that the outcome will be one.For the classification of new samples, the model calculates the probability value and then determines which category the sample belongs to based on a threshold value (usually 0.5).In this paper, the LOR classifier is constructed for calculating the probability that each pixel in a hyperspectral image belongs to a category.3.3The ExtendedRandom Walk Model 3.3.1 Random walk model: Random walk based segmentation algorithm represents an image as a weighted graph consisting of nodes u U  and paths e E , where the nodes represent pixels in the image and the paths represent links connecting neighboring pixels (XudongKang, 2014).The spatial correlation between pixels can be defined by assigning weights

Figure 5
Figure5below shows the framework diagram of a combined spatial and spectral classification algorithm for hyperspectral images based on an extended random walk.The method contains two main steps: in the first step, the initial probability that a pixel belongs to a certain category is estimated using LOR; in the second step, the global random walk energy function containing spatial and spectral terms is constructed, the initial random walk probability is optimized by obtaining its closed solution, and the final classification result is obtained based on the probability maximization criterion.
Input the training set, construct the LOR, and calculate the probability , i n r that each pixel in the hyperspectral image belongs to class k th;Step 2: MNF transform is performed on hyperspectral images Y , and the first band obtained by transformation is taken as the first principal component U , and it is expressed in the form of a weighted map.The corner point i u U  in the weighted graph represents the value of the i th pixel in the principal component, while the path ij e between neighboring corner points is
total number of true sugarcane pixels In this paper, we statistically obtained the total amount of sugarcane based on the classification of hyperspectral imagery in 2020 is 149.7 2 km , and the total amount of sugarcane based on the classification of hyperspectral imagery in 2020 is 147.3 2 km (Table Based on hyperspectral remote sensing data, the application of extended random walk model can be effective for sugarcane recognition.However, compared to Kavita Bhosle et al. who used deep learning CNN for sugarcane classification based on EO-1 Hyperion hyperspectral images, the classification accuracy of the results in this paper is lower than the former.