The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume XLVIII-1/W2-2023
https://doi.org/10.5194/isprs-archives-XLVIII-1-W2-2023-139-2023
https://doi.org/10.5194/isprs-archives-XLVIII-1-W2-2023-139-2023
13 Dec 2023
 | 13 Dec 2023

ON THE ACCURACY OF YOLOV8-CNN REGARDING DETECTION OF HUMANS IN NADIR AERIAL IMAGES FOR SEARCH AND RESCUE APPLICATIONS

J. Berndt, H. Meißner, and T. Kraft

Keywords: Deep Learning, YoloV8, Human Detection, CNN, Convolutional Neural Network, UAV, Aerial Images

Abstract. The use of deep learning techniques especially in conjunction with convolutional neural networks (CNN) has attracted major attention of the remote sensing community. Main use cases are object detection, image classification and image segmentation. The paper will focus on object detection, specifically on detection of humans. In search and rescue applications it is common to map larger areas with downward facing cameras. However, there are many training data sets for CNNs showing oblique images which strongly differ from nadir aerial images used for real-time maps.
To circumnavigate this issue, an unique data set was created. It solely contains nadir images at different ground sample distances (GSD) varying from one to five centimetres. Diversity of the training data is ensured through various flights using an unmanned aerial vehicle (UAV) at different locations. GSD dependency is valuable prior knowledge as it enhances the difficulty associated with human detection in aerial images. An image, depicting a human at one centimetre GSD contains much more information than the same human depicted in an image of three centimetres. That is one reason why networks trained on a variety of ground sample distances possibly struggle to detect humans reliably on a certain GSD. 
The unique data set consists of four subsets (divided by GSD). Each subset contains 1000 manually annotated humans, augmented by rotation and colour shift resulting in 12000 training samples used to train the new released YoloV8 CNN. The entire training and test process is unified to ensure comparable input conditions.