ISPRS-Archives

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISPRS-Archives

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci.

2194-9034

Copernicus Publications

Göttingen, Germany

10.5194/isprs-archives-XLIII-B3-2022-1189-2022

UNSUPERVISED HARMONIOUS IMAGE COMPOSITION FOR DISASTER VICTIM DETECTION

Zhang

¹ Nex

https://orcid.org/0000-0002-5712-6902

Vosselman

https://orcid.org/0000-0001-8813-8028

Kerle

Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, the Netherlands

31 05 2022

XLIII-B3-2022 1189 1196

2022

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/

This article is available from https://isprs-archives.copernicus.org/articles/isprs-archives-XLIII-B3-2022-1189-2022.html

The full text article is available as a PDF file from https://isprs-archives.copernicus.org/articles/isprs-archives-XLIII-B3-2022-1189-2022.pdf

Deep detection networks trained with a large amount of annotated data achieve high accuracy in detecting various objects, such as pedestrians, cars, lanes, etc. These models have been deployed and used in many scenarios. A disaster victim detector is very useful when searching for victims who are partially buried by debris caused by earthquake or building collapse. However, considering that larger quantities of real images with buried victims are difficult to obtain for training, a deep detector model cannot give full play to its advantages. In this paper we generate realistic images for training a victim detector. We first randomly cut out human body parts from an open source human data set and paste them into the ruins background images. Then, we propose an unsupervised generative adversarial network (GAN) to harmonize the body parts to fit the style (illumination, texture and color characteristics) of the background. These generated images are finally used to fine-tune a detection network YOLOv5. We evaluate both the AP (average precision) for IoU (Intersection over Union) 0.5 and for IoU ∈ [0.5:0.05:0.95], which are denoted as AP@0:5 and AP@[.5 : .95], respectively. The best experimental results show that the YOLOv5l pre-trained on the COCO data set performs poorly on detecting victims, and the AP@[.5 : .95] is only 19.5%. The model that uses our composite images as fine-tuning data can effectively detect victims, and increases the AP@[.5 : .95] to 33.6%. The AP@0:5 increases from 32.4% to 53.4%. Our unsupervised harmonization method further improves the results by 2.1% and 6.1%, respectively.