The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Publications Copernicus
Articles | Volume XXXIX-B3
23 Jul 2012
 | 23 Jul 2012


M. Menze and D. Muhle

Keywords: Image Sequences, Stereoscopic Vision, Image Matching

Abstract. Video surveillance systems are no longer a collection of independent cameras, manually controlled by human operators. Instead, smart sensor networks are developed, able to fulfil certain tasks on their own and thus supporting security personnel by automated analyses. One well-known task is the derivation of people’s positions on a given ground plane from monocular video footage. An improved accuracy for the ground position as well as a more detailed representation of single salient people can be expected from a stereoscopic processing of overlapping views. Related work mostly relies on dedicated stereo devices or camera pairs with a small baseline. While this set-up is helpful for the essential step of image matching, the high accuracy potential of a wide baseline and the according good intersection geometry is not utilised. In this paper we present a stereoscopic approach, working on overlapping views of standard pan-tilt-zoom cameras which can easily be generated for arbitrary points of interest by an appropriate reconfiguration of parts of a sensor network. Experiments are conducted on realistic surveillance footage to show the potential of the suggested approach and to investigate the influence of different baselines on the quality of the derived surface model. Promising estimations of people’s position and height are retrieved. Although standard matching approaches show helpful results, future work will incorporate temporal dependencies available from image sequences in order to reduce computational effort and improve the derived level of detail.