Comparison of manual and semi-automated synthetic training data creation for individual tree crown delineation
Keywords: Manual Labeling, Synthetic Training Data, Individual Tree Crown Delineation
Abstract. Deep learning models in the field of individual tree detection and crown delineation (ITDCD) rely on large and high-quality annotation datasets to produce accurate predictions. Training data or annotations for most ITDCD studies are collected through manual labeling. Manual labeling, especially for complex structures like tree crowns, is a time-consuming process that often results in error-prone annotations. Error-prone annotations, in turn, can lead to significant errors in the predictions of deep learning models. Semi- or fully-automated training data creation shows the potential to make the creation process more efficient and ensure high quality of the training dataset. In this work, we present a methodology for generating semi-automated synthetic training data for deep learning-based ITDCD applications. Furthermore, a systematic criteria-based - validity, efficiency, variety and scalability - comparison is conducted between the manual and synthetic training data creation methods to structurally and practically illustrate the advantages and disadvantages of the two approaches. Overall, the semi-automated synthetic data approach outperforms manual labeling in terms of validity, efficiency, and scalability; once the algorithm is implemented, it rapidly generates arbitrarily large, high-quality, reproducible tree crown annotation datasets. In contrast, a manual creation approach shows its advantages as an efficient way to create small, low-quality datasets (e.g., for fine-tuning a pre-trained model) compared to developing a semi-automated method from scratch.
