<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher">ISPRS-Archives</journal-id>
<journal-title-group>
<journal-title>The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences</journal-title>
<abbrev-journal-title abbrev-type="publisher">ISPRS-Archives</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2194-9034</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5194/isprs-archives-XLIII-B3-2022-1189-2022</article-id>
<title-group>
<article-title>UNSUPERVISED HARMONIOUS IMAGE COMPOSITION FOR DISASTER VICTIM DETECTION</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Zhang</surname>
<given-names>N.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Nex</surname>
<given-names>F.</given-names>

</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<ext-link>https://orcid.org/0000-0002-5712-6902</ext-link></contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Vosselman</surname>
<given-names>G.</given-names>

</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<ext-link>https://orcid.org/0000-0001-8813-8028</ext-link></contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Kerle</surname>
<given-names>N.</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group><aff id="aff1">
<label>1</label>
<addr-line>Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, the Netherlands</addr-line>
</aff>
<pub-date pub-type="epub">
<day>31</day>
<month>05</month>
<year>2022</year>
</pub-date>
<volume>XLIII-B3-2022</volume>
<fpage>1189</fpage>
<lpage>1196</lpage>
<permissions>
<copyright-statement>Copyright: © 2022 N. Zhang et al.</copyright-statement>
<copyright-year>2022</copyright-year>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p>
</license>
</permissions>
<self-uri xlink:href="https://isprs-archives.copernicus.org/articles/isprs-archives-XLIII-B3-2022-1189-2022.html">This article is available from https://isprs-archives.copernicus.org/articles/isprs-archives-XLIII-B3-2022-1189-2022.html</self-uri>
<self-uri xlink:href="https://isprs-archives.copernicus.org/articles/isprs-archives-XLIII-B3-2022-1189-2022.pdf">The full text article is available as a PDF file from https://isprs-archives.copernicus.org/articles/isprs-archives-XLIII-B3-2022-1189-2022.pdf</self-uri>
<abstract>
<p>Deep detection networks trained with a large amount of annotated data achieve high accuracy in detecting various objects, such as pedestrians, cars, lanes, etc. These models have been deployed and used in many scenarios. A disaster victim detector is very useful when searching for victims who are partially buried by debris caused by earthquake or building collapse. However, considering that larger quantities of real images with buried victims are difficult to obtain for training, a deep detector model cannot give full play to its advantages. In this paper we generate realistic images for training a victim detector. We first randomly cut out human body parts from an open source human data set and paste them into the ruins background images. Then, we propose an unsupervised generative adversarial network (GAN) to harmonize the body parts to fit the style (illumination, texture and color characteristics) of the background. These generated images are finally used to fine-tune a detection network YOLOv5. We evaluate both the AP (average precision) for IoU (Intersection over Union) 0.5 and for IoU &amp;isin; [0.5:0.05:0.95], which are denoted as &lt;i&gt;AP&lt;/i&gt;@0:5 and &lt;i&gt;AP&lt;/i&gt;@[.5 : .95], respectively. The best experimental results show that the YOLOv5l pre-trained on the COCO data set performs poorly on detecting victims, and the &lt;i&gt;AP&lt;/i&gt;@[.5 : .95] is only 19.5%. The model that uses our composite images as fine-tuning data can effectively detect victims, and increases the &lt;i&gt;AP&lt;/i&gt;@[.5 : .95] to 33.6%. The &lt;i&gt;AP&lt;/i&gt;@0:5 increases from 32.4% to 53.4%. Our unsupervised harmonization method further improves the results by 2.1% and 6.1%, respectively.</p>
</abstract>
<counts><page-count count="8"/></counts>
</article-meta>
</front>
<body/>
<back>
</back>
</article>
