RESEARCH ON NODE NETWORK TRANSMISSION CAPACITY PREDICTION MODEL FOR LARGE SCALE REMOTE SENSING DATA COLLECTION

: In recent years, the use of remote sensing technology has grown exponentially in various industries such as agriculture, forestry, and urban planning. Remote sensing data collection systems rely on a network of nodes to collect and transmit data. The transmission capacity of these node networks is a critical factor in the performance and efficiency of the entire system. However, accurately predicting the transmission capacity of a node network can be a challenging task. To carry out large scale open remote sensing data collection, it is necessary to predict the network transmission capacity of nodes in the face of the difference in the execution speed of each node for various tasks. It is necessary to predict the network transmission capacity of nodes. In this research, we propose a node network transmission capacity prediction model for large scale remote sensing data collection using a combination of Particle Swarm Optimization (PSO) and Backpropagation (BP) algorithms. The proposed PSO-BP model aims to accurately predict the transmission capacity of a node network in a remote sensing data collection system. The model is tested and evaluated using a large-scale dataset and the results show that the proposed model outperforms existing models in terms of prediction accuracy. This work contributes to the field of remote sensing data collection by providing a reliable and efficient method for predicting the transmission capacity of node networks.


INTRODUCTION
Space remote sensing technology is entering the era of industrial applications, in which Landsat, Sentinel and other public interest satellite remote sensing data play an important role (Hemati et al., 2021;Phiri et al., 2020;Segarra et al., 2020).With the development of remote sensing technologies and advancement of deep learning-based algorithms, significant progress has been achieved in recent years in many remote sensing tasks, for example, object detection from remote sensing images (Cheng and Han, 2016;Deng et al., 2018;Li et al., 2020;Qian et al., 2020;Song et al., 2020;Wang et al., 2021;Zhang et al., 2019;Zhu et al., 2021), remote sensing change detection (Hecheltjen et al., 2014;Jensen and Im, 2007;Khelifi and Mignotte, 2020;Shi et al., 2022;Q. Wang et al., 2018;Zhang et al., 2021) and remote sensing big data (Chi et al., 2016;Deren et al., 2014;Liu et al., 2018;Ma et al., 2021Ma et al., , 2015;;Yu et al., 2021aYu et al., , 2021b)).Remote sensing data user groups such as individuals, small and medium-sized enterprises are gradually increasing.In the process of data application, satellites are generally publicly released on the Internet for global users, which is limited by the service capability of the data source itself and the acquisition capability of data user nodes.Due to the constraints of data source's own service capability and data user node acquisition capability, the problems of low efficiency of large-scale public remote sensing data acquisition and low utilization rate of user acquisition nodes still exist (Lee et al., 2011).The problems such as low efficiency of large-scale public remote sensing data collection and low utilization rate of user collection nodes still exist (Wang et al., 2022).However, there are few studies on network node transmission capacity prediction.To carry out large scale open remote sensing data collection in crowdsourcing mode, it is necessary to predict the network transmission capacity of nodes considering the difference in the execution speed of each node for various tasks.The traditional Backpropagation (BP) algorithm uses the inertia weights between the input and hidden layers and the inertia weights between the hidden layer and the output layer to establish the prediction model.The inertia weights between the hidden layer and the output layer are randomly initialized, and the learning factors are randomly generated constants.The algorithm has a strong sensitivity to these coefficients during the training process, and it is easy to fall into long iteration time and local extremes.This study analyses the possible factors affecting the transmission speed of the node network at the levels of data source, transmission medium and receiving terminal of open remote sensing data collection.A range of controllable factors were selected to establish a prediction model of network transmission capacity.For the traditional BP neural network algorithm, the inertia weights of network algorithm, it is easy to fall into local minima and other shortcomings.The Particle Swarm Optimization (PSO) algorithm with good robustness and global search capability is combined with the BP neural network algorithm to propose an improved PSO-BP algorithm.The improved algorithm includes the introduction of gradient, momentum factor, and inertia weight adjustment function.To further improve the accuracy of the algorithm in the iteration, the accuracy dynamic adjustment function is introduced, and the learning factor is also introduced into dynamic adjustment function.Finally, a comparison experiment between the improved PSO-BP algorithm and the traditional BP algorithm is conducted on a real data set collected from Sentinel 2.

Analysis of factors affecting the transmission capacity of node networks
This paper considers the data transmission in large-scale public remote sensing data collection under the crowdsourcing model.
During the collection process, data is transmitted from the public remote sensing data source through multiple transmission media to the collection node.The data transmission process mainly involves "two sources and one medium", which are the data source, the transmission medium, and the receiving terminal.The paper analyses the characteristics of the three stages of factors affecting node network transmission capacity, and Figure 1 shows the factors affecting node network transmission capacity.

Constraint analysis based on the data source side
The main factors that affect the transmission capacity of a node network at the data source are related to the physical configuration and server parameter settings of the data source server.
In order to ensure the long-term stability and security of the data source, the number of concurrent accesses, IPs, access frequencies, access bandwidth speeds, allowed access time periods, and allowed access areas of users are usually restricted based on the physical configuration of the server node.
Through long-term collection, it has been found that when a single user frequently accesses the USGS data source for a long period of time, the user's order production speed is relatively slow, and the data download speed in the morning is much faster than that in the afternoon.The data source is closed for maintenance every Wednesday in Beijing time.User information must be provided when downloading, and the download speed is the fastest in the early morning in Beijing time.

Constraint analysis based on transmission medium
In the long-term practice process, it is found that the acquisition nodes located in different geographical locations also have significant differences.Therefore, it is guessed that when using the network medium for data transmission, the total network sites in different spatial regions sources differ, and the service capacity of configured network bandwidth varies from one spatial region to another.Similarly, in the same spatial region under the same network configuration environment, different physical facilities such as routers, cables, switches, similarly affect the data in transmission speed in the network medium.
To verify the effect of network topology on the transmission speed of data in the network medium, relevant experiments on relay service nodes were conducted.First, taking the public data source as the source end and the relay node as the receiving end, the relay service node host directly collected a certain amount of remote sensing data from the above-mentioned public data source, and stored it in the local physical hard disk of the relay service node.Then, taking the relay node as the source end and the various collection nodes as the receiving end, other collection nodes download data from the relay node to the local.
The experimental results show that the speed of using the relay node to collect public remote sensing data is indeed much higher than that of not using the relay node, but due to the high cost of using the relay node, it is not suitable for large-scale long-term public remote sensing data collection.

Constraint analysis based on the acquisition terminal
Among the many factors that affect the network transmission capability of the acquisition terminal, the node's own network bandwidth configuration plays a dominant role.Having a higher network bandwidth configuration is more beneficial for network data transmission.During the work process of the collecting terminal, when other applications occupy a large amount of network bandwidth, it seriously affects the node's ability to collect data using the network bandwidth.When the terminal uses third-party software to limit the network bandwidth of the collection client, it also affects the efficiency of data network transmission.The number of threads, task type, start time, and task length set by the terminal collection client also affect the network transmission speed of the node, and these factors are controllable factors of the collection terminal, so these controllable parameters can be used as input for subsequent network transmission ability prediction models.In addition, factors such as disk read and write speed, network card, and other physical hardware facilities on the node also affect the network transmission ability of the node.

Model parameter feature selection
To establish a model for predicting node network transmission capabilities and accurately predict the network transmission capabilities of crowd-sourced collection nodes, the factors that The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-M-1-2023 39th International Symposium on Remote Sensing of Environment (ISRSE-39) "From Human Needs to SDGs", 24-28 April 2023, Antalya, Türkiye This contribution has been peer-reviewed.https://doi.org/10.5194/isprs-archives-XLVIII-M-1-2023-25-2023| © Author(s) 2023.CC BY 4.0 License.affect the data in the network transmission environment were analysed in the previous section from the three stages of data source, transmission medium, and receiving terminal.Many of these factors are uncertain and not controlled by the receiving terminal, making it difficult to discover their intrinsic patterns.Therefore, they cannot be considered in the node network transmission capability prediction model.Considering the need for publicly available remote sensing data collection, and from the perspective of being long-term, stable, inexpensive, and practical, many uncontrollable factors are discarded.Without considering many data source-side factors and physical factors, a selection of controllable factors is made from the perspective of dynamic adjustments to the client application parameters of the collection terminal.Using machine learning theory, a model for predicting node network transmission capabilities is established.The model is trained using the node's actual historical collection records, enabling the prediction of the node's network transmission speed.

PSO
PSO is an algorithm with advantages of simplicity, fast convergence and easy implementation, and is widely used in scheduling optimization, data mining, model training and other aspects of swarm intelligence optimization algorithms (D.Wang et al., 2018).
The position of the i-th particle in the population in a Ddimensional search space consisting of N particles is denoted as which are the possible solutions of the problem.The velocity of the particle also consists of a D-dimensional vector, denoted as .It determines the direction and distance of the particle moving in the population.In addition, the movement of each particle component of the velocity is bounded by the maximum limiting velocity Vm.
when Vij > Vm, .Similarly, the particle s displacement xij is also bounded by xm.When xij > xm, , the fitness value of the particle at this time can be calculated according to the objective function.To prevent the blind search of particles, usually .The equations of its velocity and displacement update during the particle iterative process are as follows. (1) In Equation ( 1), the number of current iterations of the particle is denoted by k, and the inertia weight coefficient is denoted by w.The larger the value of inertia weight, the stronger the global search ability of the particle.c1 and c2 denote the learning factors.c1 describes the influence of the particle by the individual extremes, so that the particle has global search ability and avoids getting into local solutions.c2 represents the influence of global optimum on the particle.r1 and r2 are random numbers between (0,1).w, c1 and c2 are the three constants that jointly determine the spatial search ability of the particle.The position of the optimal fitness value calculated by the particle during the iteration process is represented by the individual extreme value, which are expressed as .The position of the optimal fitness value computed by all particles in the population during the iteration is denoted by the global extremum pg: .

PSO-BP Algorithm
In traditional BP neural network algorithms, the weights and thresholds are randomly generated, so the algorithm has randomness and unreliability.This paper proposes the PSO-BP neural network algorithm.First, the particles of PSO algorithm are initialised as the initialization weights of BP neural network algorithm, then iteration of PSO algorithm starts.Each iteration of PSO algorithm is followed by an execution of BP neural network algorithm.The iteration termination condition of BP neural network algorithm is determined by the prediction accuracy, which is dynamically adjusted as the number of iterations of PSO algorithm increases.The training termination condition of BP neural network algorithm becomes stricter in the later stage of PSO algorithm.After each iteration of BP neural network algorithm, the absolute value of the error is used evaluate the fitness of each particle of PSO algorithm.When the PSO algorithm satisfies its iteration termination condition, the PSO-BP algorithm also ends.
The parameters involved in the algorithm are the population size N, the dimensionality of the particles D, the number of iterations of the particles M, the inertia weight w, the learning factors c1 and c2, the maximum displacement xm , and the maximum velocity vm.The BP neural network algorithm uses a three-layer network topology.The value of each particle in the PSO algorithm represents the inertia weight in the BP neural network algorithm.The dimensionality of the particles D can be expressed as: , (2) In equation ( 2), the Din , Dh, and Dout denote the number of neuros in the input layer, hidden layer, and output layer of the BP neural network, respectively.
In the process of predicting the transmission capacity of the nodes, six factors including the number of download task threads, task progress, task length, acquisition terminal time (in acquisition terminal time (in minutes), and data source time (in hours) are used as the six neurons in the input layer.The number of neurons in the output layer is 1, and the number of neurons in the hidden layer is 10.The dimension D= 70, which indicates there are 70 inertia weights.For each particle's position , xi,1 to xi,60 denote the inertia weights from the input layer to the hidden layer and xi,61 to xi,70 denote the inertia weights from the hidden layer to the output layer.

Improvement of inertia weights: Inertia weight (w) in
Particle Swarm Optimization (PSO) represents the ability of a particle to inherit its current velocity from its previous velocity.A smaller value of w is beneficial for local search, while a larger value is beneficial for global search.To balance the global and local search abilities of particles in PSO, the inertia weight is improved by modifying the inertia weight using the following equation ( 3) , (3) In Equation ( 3), the initial and termination values of inertia weights are denoted by wstart and wend, respectively, and wstart < wend.The current number of particle iterations is k, and the maximum number of iterations is denoted by M. Due to the changing characteristics of the cosine function, the inertia weight (w) decreases slowly during the initial and final iterations.Therefore, the algorithm has a longer global optimization time at the beginning of the iterations, effectively reducing the risk of getting trapped in a local optimum.Additionally, in the later stages of the iterations, The algorithm can perform local search with a finer granularity, allowing for more incremental and precise adjustments.

Improvement of learning factors:
In PSO algorithm, the learning factors c1 and c2 are the acceleration factors for the individual and global best values of a particle, respectively.Similarly, to enable the particles to have a good global search ability in the early stages of the iterations and to improve the precision and convergence speed of the particles in the later stages, the values of c1 and c2 are adjusted dynamically.This approach maintains the diversity of the population in the early stages of the search and improves the search performance in the later stages.The formula for calculating c1 and c2 is as follows. (4) In Equation ( 4), the initial and termination values of the learning factor c1 are denoted by cstart and cend, respectively, and 0<cstart<cend<4.M is the maximum number of iterations, and k is the current number of iterations.

Adaptation function:
After initializing the neural network, the absolute value of the actual output result and the predicted result of the download force of the node at the next moment as the fitness function of the PSO algorithm: (5) In Equation ( 5), X is the input to the neural network which are the six input metrics mentioned earlier.yi and yi ' are the node the actual and predicted download speed.In the iterative process, the variation of velocity and displacement of the particles uses equation ( 1).The inertia weights and learning factors are calculated using in equations ( 3) and (4), respectively.

Data normalisation:
To improve the accuracy of the training and reduce the errors caused by the difference in the magnitude of the input values, the input data is normalized to the range of (0,1).This is done to make the input neurons more sensitive and to reduce the errors caused by the difference in the scale of the factors.The equation for normalization is as follows: , In equation ( 6), the normalized result of input neuron i is denoted as Xi,norm.Xi,min is the minimum value of the input neuron, and Xi,max is the maximum value of the input neuron.

Data inverse normalization:
To facilitate the comparison of predicted results with actual results and to make it easier to calculate the fitness value of each particle, it is necessary to reverse the normalization process of the particle's predicted results.The equation for reverse normalization is as follows: , In Equation ( 7), Yi,norm represents the result of the i-th dimension data after de-normalization, Yi represents the result of the i-th dimension data before de-normalization, that is, the un-normalized result after prediction, Yi,maxV represents the maximum value of the i-th dimension data in the actual results, and Yi,minV represents the minimum value of the i-th dimension data in the actual results.

Accuracy dynamic adjustment function:
To achieve the goal of low precision (low expected error) in the early stages of BP neural network training and high precision (high expected error) in the later stages, this paper introduces a precision dynamic adjustment function to dynamically adjust the error precision of the BP algorithm during the training process.The expression for the precision dynamic adjustment function is as follows: (8) In Equation ( 8), a is the initial accuracy which usually is 1, b is the accuracy adjustment factor, and k is the current number of iterations.
The execution flow of the PSO-BP algorithm is shown in Figure 2.

Experimental parameters setting
To validate the execution results of the above proposed PSO-BP algorithm, BP, and PSO-BP algorithms were evaluated to conduct the experiments with real data sets.In Table 2, the parameters of the above two algorithms are set.The number of neurons in the input layer is 6, the number of neurons in the hidden layer is 10, and the number of neurons in the output layer is 1.The sigmoid function is used for the excitation function of the hidden layer.The inertia weights the input layer to the hidden layer and from the hidden layer to the output layer are adjusted by combining gradient descent and momentum factor.

Figure 1 .
Figure 1.The factors that affect the transmission ability of nodes in a network

Figure 3
Figure3below shows the comparison of the error rates of the two algorithms for the data in Table3.The prediction errors of the BP algorithm and the PSO-BP algorithm are very different, with the former having a large variance in the prediction results and the latter having a relatively flat prediction error.The errors of the two algorithms are mostly concentrated around 5%.

Figure 3 .
Figure 3. Error rates comparison between BP algorithm and PSO-BP algorithm

Figure 4 .
Figure 4. Results comparison between BP algorithm and PSO-BP algorithm an improved PSO-BP algorithm by introducing the accuracy dynamic adjustment function and the learning factor adjustment function.Finally, a comparison experiment between the improved PSO-BP algorithm and the BP algorithm is conducted on the real data set to verify the feasibility of the prediction model and the improved PSO-BP algorithm.

Table 1 .
Table 1 shows the selected features that affect network transmission capabilities.Selected features affecting network transmission capability.

Table 2 .
Algorithm parameters setting

Table 3 .
Selected prediction results

Table 4 .
Evaluation of algorithm prediction resultsFrom Table4, the mean relative error and root mean square error of PSO-BP algorithm are smaller than those of BP algorithm, which indicates that the PSO-BP algorithm has better prediction results.Tables 5 show the number of test sets in different error rate ranges for each of the two algorithms.The proposed PSO-BP algorithm achieves better accuracy comparing with traditional BP algorithm.

Table 5 .
Error statistics on test set