Semantically-Based Animal Pose Estimation in the Wild
Keywords: Semantic Segmentation, Animal Pose Estimation, Deep Learning, Wild Animals
Abstract. Accurate animal pose estimation in the wild is potentially useful for many downstream applications such as wildlife conservation. Currently, the main approach to assessing animal poses is based on identifying keypoints of the body and constructing the skeleton. However, a direct application of frameworks to human pose estimation is not successful due to the features of the skeletal structure of humans and mammals. In this study, we propose a two-stage method: coarse-tuning with animal detection using a bounding box, as is done in most similar methods, and fine-tuning with semantic segmentation of animal. The YOLOv8 Pose Estimation and Pose Keypoint Classification model was chosen as the base model for keypoint extraction. Extensive training experiments were conducted using the AwA2 dataset (with a small number of samples from own dataset), the AP-10K dataset, and the Tiger-Pose dataset. The trained model was tested on own dataset collected from camera traps in the Ergaki National Park, Russia. Experimental results show that the proposed algorithm using additional semantic segmentation increases the accuracy of animal pose estimation by 3.6–4.8% on samples of the Ergaki dataset.