MOHE-NET: MONOCULAR OBJECT HEIGHT ESTIMATION NETWORK USING DEEP LEARNING AND SCENE GEOMETRY
Keywords: Object Detection, Height Estimation, Moving Camera, Convolutional Neural Networks, linear MLP
Abstract. Estimating the heights of objects in the field of view has applications in many tasks such as robotics, autonomous platforms and video surveillance. Object height is a concrete and indispensable characteristic people or machine could learn and capture. Many actions such as vehicle avoiding obstacles will be taken based on it. Traditionally, object height can be estimated using laser ranging, radar or stereo camera. Depending on the application, cost of these techniques may inhibit their use, especially in autonomous platforms. Use of available sensors with lower cost would make the adoption of such techniques at higher rates. Our approach to height estimation requires only a single 2D image. To solve this problem we introduce the Monocular Object Height Estimation Network (MOHE-Net) that includes a cascade of two networks. The first network performs the object detection task. This network detects the bounding box of objects of interest. This information is then input to a second network to estimate the object height and is a linear Multi-layer Perceptron (MLP). The linear MLP model models the camera-scene geometry and does not require training or contain activation function as normal MLP did. The developed approach works for static camera set up as well as moving platform. The proposed approach performs state-of-the-art and can be deployed for obstacle avoidance on autonomous platforms. Our code is available at https://github.com/OSUPCVLab/Ford2019/tree/master/Moving%20Object%20Height% 20Estimation%20Network