The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Publications Copernicus
Download
Citation
Articles | Volume XLII-1
https://doi.org/10.5194/isprs-archives-XLII-1-401-2018
https://doi.org/10.5194/isprs-archives-XLII-1-401-2018
26 Sep 2018
 | 26 Sep 2018

COMPARISON OF TWO METHODS FOR 2D POSE ESTIMATION OF INDUSTRIAL WORKPIECES IN IMAGES – CNN VS. CLASSICAL IMAGE PROCESSING SYSTEM

C. Siegfarth, T. Voegtle, and C. Fabinski

Keywords: Automatic image analysis, CNN, Shape model, Industrial Application

Abstract. Today, automatic image analysis is one of the basic approaches in the field of industrial applications. One of frequent tasks is pose estimation of objects which can be solved by different methods of image analysis. For comparison two of them have been selected and investigated in this project: Convolutional Neural Networks (CNNs) and a classical method of image analysis based on contour extraction. The main point of interest was to investigate the potential and limits of CNNs to fulfil the requirements of this special task regarding accuracy, reliability and time performance. The classical approach served as comparison to a state-of-the-art solution. The workpiece for these investigations was a commonly used transistor element. As database an image archive consisting of 9000 images with different illumination and perspective conditions has been generated. One part was used for training of the CNN and the creation of a so-called shape model respectively, the rest for the investigation of the extraction quality. With CNN technique two different approaches have been realised. Even if CNNs are predestined for classification this method delivered insufficient results. In a more sophisticated approach the system learns the parameters of an affine transformation including the sought-after parameters of translation and rotation. Our experiments confirm that CNNs are able to obtain at best only a medium accuracy of rotation angles (about ± 2°), in contrast to the classical approach (about ± 0.5°). Concerning the determination of translations both methods deliver comparable results, about ± 0.5 pixel from CNN and about ± 0.4 pixel from classical approach.