Improving Gesture Recognition Efficiency with MediaPipe and YOLO-Pose
Keywords: Gesture Recognition, Keypoint Detection, Performance Algorithms, Computer Vision, MediaPipe, YOLO-Pose
Abstract. This paper presents an improved combined approach for gesture recognition, combining a fast and lightweight keypoint detection algorithm using the MediaPipe method with a highly accurate YOLO-Pose model (integration of keypoints into the YOLO pipeline). This combination allows to drastically reduce the computational load compared to traditional convolutional networks while maintaining or even improving the recognition accuracy. As part of the extended study, in addition to the original experiment comparing different models on the HaGRID dataset, an additional experiment was implemented to evaluate the robustness of the system to changes in camera angle and gesture execution speed. The results show that the proposed method provides stable gesture recognition with mean Average Precision above 0.80 even under extreme conditions, which opens up prospects for its integration into mobile and embedded systems. We also tested different Artificial Intelligence ensembles to detect and classify gestures, but results for traditional methods are worse then YOLO-pose with MediaPipe.