VCF: A Real-World Video Conference Deepfake Benchmark for Face-Swap Detection and Robustness Evaluation
Keywords: Dataset, Benchmark, Deepfake detection, Face-swapping detection
Abstract. The rapid advancement of deepfake generation techniques poses significant security and privacy risks, particularly in video conferencing scenarios where variable resolutions, compression artifacts, and environmental factors complicate detection. Existing benchmarks often fail to address these context-specific challenges, limiting their applicability to real-world communication platforms. To bridge this gap, we introduce VCF (Video Conference DeepFakes) dataset, the first, to the best of our knowledge, specialized benchmark designed for evaluating deepfake detection in video conferencing contexts. VCF leverages the VCD dataset as target videos and the LaPa dataset as a set of source faces, algorithmically ranking sources by similarity in gender, ethnicity, age, and facial hair to select optimal matches for enhanced deepfake visual plausibility. The dataset incorporates multi-resolution videos, H.264 compression artifacts from different compression rates, and diverse backgrounds to simulate conditions specific to video conferences. Comprehensive evaluations of 14 detection methods reveal significant performance degradation under video quality variations. Our results emphasize the critical need for robust detection frameworks resilient to resolution shifts, compression artifacts, and diverse generation pipelines. VCF provides a standardized, scenario-specific benchmark to drive advancements in securing digital communication platforms against evolving deepfake threats.