HIGH-PERFORMANCE TWO-LEVEL PARALLEL COMPUTING S CHEME FOR NANOPARTICLES DETECTION IN SEM IMAGES

: In our previous work, an exponential approximation (EA) method was proposed for the detection of nanoparticles in scanning electron microscope (SEM) images. It shows the best quality of nanoparticle detection compared to other methods. But its main drawback is that it takes a lot of time. In this paper, we propose a two-level parallel computing scheme and a corresponding high-performance python+MPI implementation of the EA method. Experiments have shown that the developed parallel implementation can significantly speed up the computational process.


INTRODUCTION
Electron microscopy is one of the methods for studying the microstructure of solids, their electric and magnetic fields, and local composition using a combination of electron probe methods.Today it is one of the most effective and advanced research methods, which is widely used at enterprises, in scientific and educational laboratories.The resolving power of scanning electron microscopes (SEM) makes it possible to obtain an increase of millions of times, which is ideal for studying microscopic objects (Klein et al., 2012).
Scanning electron microscopy can be used to solve many urgent practical problems.In particular, this paper deals with a problem of mapping "hidden" defects on the surface of carbon materials that cannot be detected by other methods (Boiko, 2021).The solution of this problem is very important for many industrial processes (Mao, 2021), including processes using such an important catalyst as palladium on carbon (Liu, 2018;Felpin, 2006), since the presence of defects affects the properties of catalysts and their ability to accelerate chemical processes.
The mapping methodology consists in applying metal nanoparticles 1-5 nm sized as markers on the surface under study and then photographing it with an electron microscope.A direct relationship between the presence of defects and the ordering of nanoparticles was studied experimentally in (Boiko , 2021).Due to the predominant association of nanoparticles with defects, SEMimages obtained in this way correspond with sufficient accuracy to the distribution of defects (Fei, 2013, Pentsak, 2015).
The problem consists in that, chemists analyze SEM-images manually.At that, hundreds and even thousands of images are formed to study only one sample of the material.It makes manual analysis too time-consuming and requires automation of detection and analysis of the relative position of nanoparticles in images.
Visually, nanoparticles are small bright areas, usually rounded.The complexity of their automatic detection lies in the fact that * Corresponding author.
the images, as a rule, are very noisy, and the nanoparticles themselves are very small in size, and, moreover, several nanoparticles can be located very close and even partially overlap each other.
In our previous work (Boiko, 2022) the Exponential approximation (EA) method was proposed to detect nanoparticles.According to experiments, it showed the best quality of nanoparticle detection compared to other methods.The main disadvantage of EA method consists in that it is time-consuming.
In this connection we propose here a two-level parallel computing scheme and a corresponding high-performance python+MPI implementation of the EA method.The choice of software implementation tools was significantly limited by the python programming language, in which the sequential version of the EA algorithm was previously implemented.
Experiments have shown that the developed parallel implementation can significantly speed up the computational process.

EXPONENTIAL APPROXIMATION (EA) METHOD FOR NANOPARTICLES DETECTION
The exponential approximation (EA) method is based on two assumptions.The first assumption is that the visual representation of each nanoparticle in the image can be modeled quite well using an exponential brightness function with a maximum at the center of the nanoparticle.The second assumption is that the parameters of the approximating functions found for small image fragments will differ for fragments containing and not containing a nanoparticle.
EA method for one image processing consists of five stages: preprocessing, extracting small image fragments, performing exponential approximation for each image fragment, detecting fragments containing a nanoparticle, and determining the radius of each particle.

Preprocessing
To exclude too bright (exposed) areas of the image, Top-Hat (Bright, 1987) is performed before the main processing.

Extracting small image fragments
According to EA method, at this stage, square image fragments of a small fixed size f size are selected, the center of which corresponds to the supposed centers of the nanoparticles (with the exception of fragments close to the image borders -they are truncated and have a displaced center).To reduce the number of calculations, a threshold is used to pre-filter fragments T h pref .
Fragments, the brightness at the central point of which is below this threshold, are excluded from consideration due to the low probability that the center of the nanoparticle has a low brightness value.The threshold T h pref is a structural parameter of the method.

Performing exponential approximation for each image fragment
Let f = [f (xi, yj) , i, j = 1, ..., f size] be a small image fragment having size f size.For all fragments, we use the same element indexing xi, yj ∈ {− ⌊f size/2⌋ , ⌊f size/2⌋}, constructed with respect to the center with coordinates x ⌊f size/2⌋ = 0 and y ⌊f size/2⌋ = 0.
To approximate each fragment, an exponential function of the form The approximation is carried out in two steps.At the first step, for each value of the parameter b from some finite set b ∈ {b1, .., bn} the optimal value of the parameter is determined as an argument that provides the minimum value of the root-mean-square approximation criterion where In this case, according to the least squares method, the optimal value of the parameter a can be easily calculated by the formula: At the second step, those values of the parameters b k and a(b k ), which provide the minimum value of the criterion, are taken as the optimal values of the desired parameters of approximating function 1.

Detecting fragments containing a nanoparticle
As a result of exponential approximation of image fragments, we get two matrices of coefficients a opt = [a opt i,j ] and b opt = [b opt i,j ], i = 1, ..., sizeX, j = 1, ..., sizeY , the size of which matches the size of the analyzed image.
To determine the fragments containing nanoparticles, we find all local maxima of the a opt matrix and take only those of them whose value exceeds the value of the adaptive threshold where mean(LM (a opt )) is the average value of all found local maxima and the coefficient Cf detect ∈ [0; 1], which is another structural parameter of the exponential approximation method.

Determining the radius of each particle
The radius of the nanoparticle centered at the point (i, j) is determined by the corresponding value b opt i,j according to the formula: 3. TWO-LEVEL PARALLEL COMPUTING SCHEME

Level 1: Parallel detecting nanoparticles in a single image
It is easy to see that stages 2, 3 and 5 of the exponential approximation method that is described in the previous section are a set of independent calculations aimed at processing individual fragments of the analyzed image.However, the organization of parallel computing turns out to be appropriate only for the most time-consuming third stage -the stage of directly exponential approximation, while the computational complexity of the second and fifth stages is low, as a result of which the overhead costs associated with the organization of parallel computing at these stages turn out to be higher than the possible effect of parallelization.The corresponding scheme of parallel computing is shown in Figure 1.It should be noted that applied research may require the simultaneous processing of hundreds and even thousands of images, while saving the processing results in the form of text and graphic files.A large number of images requiring processing, together with the low disk performance while saving the results, makes it relevant to use additionally the second level of parallelism, which corresponds to parallel image processing.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-2/W3-2023 ISPRS Intl.Workshop "Photogrammetric and computer vision techniques for environmental and infraStructure monitoring, Biometrics and Biomedicine" PSBB23, 24-26 April 2023, Moscow, Russia

Software implementation of a two-level scheme of parallel computing
Important condition for the development of a parallel implementation of the exponential approximation method, which impose significant restrictions on the structure of a parallel computing scheme and on the choice of implementation tools, was the using python programming language, which is the language for implementing the sequential version of the algorithm.
Since the distribution of intensive computations between python threads does not provide performance improvement because of the global interpreter lock (GIL), within the framework of this work, parallelization using processes is used at both of levels.
Namely, at the level 1 the python multiprocessing pool is used.This implementation of multiprocessing is convenient for use when there is a set of tasks, each of which can be solved independently of the others.In this case, such tasks are the tasks of exponential approximation for each of the selected small fragments of the analyzed image at the level 1.
To organize the parallel execution of exponential approximation of individual image fragments, a task pool is formed, in which each task is characterized by initial data (image fragment), as well as a set of parameters and the results of some auxiliary computations.Due to the overhead costs of organizing parallel computing, it turns out to be more expedient to use the number of processes, which is significantly less than the number of tasks, distributing the work among the processes as evenly as possible.
After the completion of the calculations, the results of solving problems for each problem (in this case, the coordinates of the center of the processed fragment and the found optimal values of the approximation parameters) are stored in a common list in the space of the main process.After the list is fully formed, the information contained in it is used to fill in the matrices of optimal parameter values, on the basis of which, in the next steps, fragments containing nanoparticles are determined and their radii are calculated.
To implement parallel image processing at the level 2 Message Passing Interface (MPI) technology is used.It should be noted that the between-process information exchange in this case (in contrast to the level 1) is not required.And MPI technology actually is used to start independent tasks of image processing, ensuring the most uniform processors loading.

Characteristics of computing systems
An experimental study of the developed parallel software tool was carried out on two computing systems, the main characteristics of which are given in The operation time of EA method depends on the number of nanoparticles in the image.In this regard, three types of SEMimages, containing a significantly different amount of nanoparticles, were chosen for the experiments (Figure 2).In all experiments of this paper, the exponential approximation method was run with parameters that ensured the best quality of nano-particle detection according to preliminary studies (Boiko, 2022): fragment size for approximation f size = 7; possible values of nanoparticle radii from 1 to 7 with a step of 0.1; thresh- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-2/W3-2023 ISPRS Intl.Workshop "Photogrammetric and computer vision techniques for environmental and infraStructure monitoring, Biometrics and Biomedicine" PSBB23, 24-26 April 2023, Moscow, Russia old for pre-filtering fragments T h pref = 10; adaptive coefficient for detecting fragments containing nanoparticles Cf detect = 0.4.All experiments of this section were made on the System1 whose characteristics are presented in the Table 1.
For each image, EA method was run using a different number of processes p ∈ {1, 2, 4, 6, 8, 10, 12, 14, 16}.For each run, the time of the method was fixed.The quality of nanoparticle detection does not change with a change in the number of processes, therefore, in this case, it is of no interest.
Table 2 contains the average running times of the proposed algorithm over 5 runs, as well as obtained speedup S and efficiency E of the use of processor time.
The speedup shows the ratio of the execution time of the best sequential algorithm to the execution time of the parallel algorithm (for a fixed number of processes) and is calculated by the formula: where T1 is the running time of the sequential algorithm, Tp is the running time of the parallel algorithm on p processes.
Another important characteristic of a parallel computing process is efficiency.It shows the ratio of speedup to the corresponding number of processes: where S is the algorithm speedup for p processes.
Efficiency can take values from 0 to 1.The ideal efficiency value is E = 1, which means that all processors are fully loaded throughout the entire computational process.The obtained dependencies are graphically presented in Figure 3.As we can see, in all cases the average running time decreases significantly with an increase in the number of processes used up to 12, providing a fairly high speedup (almost 10 times) and good efficiency in using processor time (the best possible value is 1).A further increase in the number of processes turns out to be inexpedient due to presence of sequential parts and an overhead of collecting results obtained by different processes.

Performance study of image pool processing
In this experiment we study performance of the proposed parallel algorithm with two levels of parallelism.
SEM images data set of 322 images of material S1 of data base (Boiko, 2020) was used.
For all parameters of the EA method, the same values were set as in the experiment from section 4.2.
For the full data set of 322 images the proposed method was run with different number of processes p1 (for level 1) and p2 (for level 2) on the System2, whose main characteristics are specified at the Table 1.The resulted times (in minutes) and the respective speedups that were computed relatively to the sequential processing time of 1239,70 minutes are presented at the Table 3.As we can see from the Table 3 and figures 4 -7, the organization of parallel computing both at the stage of detecting nanoparticles and at the stage of processing the pool of images makes it possible to increase the performance of the developed software and, accordingly, significantly reduce the time required for processing the pool of images.At the same time, it should be noted that the greatest speedup can be achieved due to the proposed and implemented in this paper the two-level parallel computing scheme.While the use of parallel processing only at one of the levels, which corresponds to the number of cores in the system and, accordingly, the maximum number of parallel processes.Thus, the proposed two-level system allows us to fully use all computing resources of the system and get the maximum possible acceleration.
Interestingly, for a fixed number of processes p2 at the image level, an increase in the number of processes p1 at the level of nanoparticles (even if the total number of processes exceeds the available number of processor cores) makes it possible to reduce the total operating time .However, the optimal number of processes at each level, according to this experimental study, is 12.

CONCLUSION
This paper proposes a two-level parallel computing scheme and a corresponding high-performance python+MPI implementation of the Exponential Approximation method for anoparticles detection in scanning electron microscope images.Experiments on the real data set of 322 SEM images on the the high-performing server and the supercomputer system (Voevodin, 2019) show that the proposed computing scheme and its implementation allows essentially reduce the computing time.
At that the highest acceleration corresponds to the number of cores in the system and, accordingly, the maximum number of parallel processes.Thus, the proposed two-level computing scheme allows us to fully use all resources of the computing system and get the maximum possible acceleration.

Figure 1 :
Figure 1: Parallel computing scheme at the level of detecting nanoparticles

Figure 2 :
Figure 2: SEM-images to study performance of parallel detecting nanoparticles

Figure 3 :
Figure 3: Dependencies of average time, acceleration (speedup) and efficiency on the number of processes for parallel nanoparticles detection

Figure 4 :
Figure 4: Dependencies of time on the number of processes at the level of detecting nanoparticles

Figure 5 :
Figure 5: Dependencies of speedup on the number of processes at the level of detecting nanoparticles

The
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-2/W3-2023 ISPRS Intl.Workshop "Photogrammetric and computer vision techniques for environmental and infraStructure monitoring, Biometrics and Biomedicine" PSBB23, 24-26 April 2023, Moscow, Russia

Figure 7 :
Figure 7: Dependencies of speedup on the number of processes at the level of image processing pool

Table 1 :
Main characteristics of computing systems 4.2 Performance study of parallel detection of nanoparticles on a single image

Table 2 :
Performance evaluation results for parallel nanoparticale detection in a single image

Table 3 :
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-2/W3-2023 ISPRS Intl.Workshop "Photogrammetric and computer vision techniques for environmental and infraStructure monitoring, Biometrics and Biomedicine" PSBB23, 24-26 April 2023, Moscow, Russia Performance evaluation results for parallel nanoparticale detection in a single image Figures 4 and 5 presents respectively dependencies of time (in minutes) and speedup on the number of processes p2 at the level of image processing pool.Different charts at these figures corresponds to different number of processes p1 at the level of detecting naoparticles in a single image.Figures 6 and 7presents