A Straightforward Camera Calibration Method Based on a Single Low-cost Cubical Target

Calibrating optical sensors with common targets facilitates the efficient and convenient acquisition of the sensor's internal parameters. In this paper, we present a new method of camera calibration utilizing a low-cost foamy cube, in a form of dice, which is based on the fact that arrangement of pip and cubical die surfaces is mutually orthogonal. Initially, each face and pips are identified through the color information on the die’s surfaces. Subsequently, the centers of pips are corrected using a circular projection model, and radial distortion coefficients are estimated based on centers’ one-to-one correspondences. After that, the tangent information between pairs of pips on orthogonal dice faces are utilized to compute vanishing points, leading to estimation of intrinsic parameters. Experimental results demonstrate that our method has similar effects compared to well-known checkerboard calibration method, reaching an average relative error of 2.43%, simplifying the calibration process in practical applications and showcasing good practicality and robustness.


Introduction
Camera calibration is a crucial step in many computer vision and robotics applications, as it compensates internal geometric errors and restores camera orientations in most cases. .Serving as a critical prelude to advanced applications ranging from autonomous navigation systems to augmented reality, the process of calibrating a camera transcends mere adjustment of optical variables-it embodies the harmonization of the digital eye with the geometrical intricacies of the physical world.
Calibration methods can be broadly classified into three categories: target-based methods, self-calibration methods, and active vision-based methods (Song et al., 2013).The target-based calibration, the most widely used approach, relies on known geometric patterns.Traditional methods utilize specific calibration objects, such as checkerboards (Zhang, 2000), Concentric circle grids (Bu et al., 2021), or deltille grids (Ha et al., 2017).With the gradual popularization of deep learning technology, self-calibration methods enable automatic estimation of camera parameters without the need for external calibration equipment, significantly enhancing system flexibility and ease of use (Liao et al., 2023).In other hand, these methods rely on the accuracy and robustness of algorithms and require diverse scene content and motion, potentially underperforming in texture-poor or dynamic environments.Active vision-based methods require camera execute designed actions to actively encode calibration information (Chen et al., 2024), achieving high accuracy but requiring specialized hardware and precise synchronization.
Despite the advances in self-calibration and active vision-based methods, target-based calibration remains the most reliable and widely adopted approach.However, the reliance on specialized calibration objects poses certain limitations, including flexibility in calibration setup and the increased cost and effort to obtain precise calibration targets (Luhmann et al., 2016).To address this issue, researchers have explored the use of common objects as calibration targets, such as coplanar coins (Bergamasco et al., 2014), spheres (Roman-Rivera et al., 2022), or even human faces (Nasir and Rao, 2016).These common objects offer greater accessibility and flexibility compared to traditional calibration targets.
To enhance the robustness of calibration methods with common objects while retaining their convenience of use, building upon the basis of our previous research (Chan et al., 2023), we propose a straightforward calibration method based on a low-cost cubical target made of light material such as foam, a die-like object.Our method utilizes the unique and rich texture characteristics of dice, including unique distribution of pips and vibrant color information on the six faces, along with the mutual orthogonality of neighborhood dice, to increase the calibration process robustness.Additionally, the simplicity and ubiquity of dice make this method highly accessible and cost-effective, reducing time required for set-up and troubleshooting.Users only need to hold the imaging sensor and collect images around the dice, without the need to move calibration targets or perform specific movement patterns.
The proposed method does not require any prior professional knowledge, including the radius and relative distance of the pips on dice surface, or preset internal parameters of the camera.The significance of our method lies in its potential to make camera calibration more accessible and adaptable to various real-world situations, ultimately benefiting a wide range of applications that rely on accurate camera parameters.The rest of this paper is organized as follows: Section 2 presents the mathematical model and calibration principles based on the geometric characteristics of dice.Section 3 describes the experimental design to validate our calibration procedure, with experimental results analyzation.Finally, Section 4 concludes the paper and discusses future research directions.

Methodology
Figure 1 illustrates the workflow of our proposed method: Initially, the system identifies the dice regions from a sequence of images captured by the RGB sensor, which correspond to the pips on each dice face.Since these pips generally appear as ellipses in the captured images unless the dice faces are exactly parallel to the image sensor plane, a circle-to-ellipse projection model is performed to correct the center of circles on image, and a distortion estimation model is employed with nine corresponding points.Once the distortions are corrected, the intrinsic parameters are computed by exploiting the geometric constraints provided by the mutually orthogonal faces of the dice.These constraints allow for the estimation of the intrinsic parameters based on the known geometry of the dice.The innovative aspects of this method lie in its simplicity, flexibility, and the ability to achieve accurate camera calibration using common objects.By utilizing the inherent geometric properties of dice, the need for specialized calibration patterns or precise measurements is eliminated.The use of readily available objects, such as dice, makes the calibration process more accessible and practical in various settings.Furthermore, the incorporation of distortion correction and the exploitation of the orthogonality constraints of the dice faces enhance the accuracy and robustness of the calibration results.
Our proposed method does not require any prior knowledge of the intrinsic parameters.The abundant geometric features and correspondence information from common objects, combined with the availability of multiple viewpoints, allows for the accurate calibration of camera parameters without depending on prior knowledge of intrinsic parameters, as the problem is wellconstrained.The subsequent sections of the paper will provide a more detailed explanation of each step in the proposed method, including the arc-supported ellipses detection for identifying dice regions, the radial distortion correction models, and the intrinsic parameters computation using vanishing points calculated by orthogonality constraints of dice faces.

Pip detection from dice
The HSV color model is used to detect the dice position and extract the pips, as it is more efficient for tracking objects with clear color properties under natural lighting conditions (He et al., 2023).The images are converted from RGB to HSV color space using the OpenCV library, and the red background color of the dice is defined by a specific range in the HSV space as follows: The red regions are extracted, converted to a binary mask, and processed using contour detection to identify the continuous red regions.To improve the reliability of subsequent pip detection, morphological dilation and erosion operations were applied to the red regions to remove small holes, and regions with small area were filtered.After that, bounding boxes were generated for each valid contour to localize possible red die in the image.
Within each bounding box, the circular pips, which appear as ellipses due to perspective projection, are detected using a fast ellipse detection method based on arc-support line segments (ASLSs) (Lu et al., 2020).The region inside the bounding box is converted to grayscale, and Canny edge detection is applied.ASLSs are extracted from the edge map, and ellipses are fitted to the ASLS groups.The ellipses are filtered by size and aspect ratio, and the red die is localized to the bounding box with the most ellipses, each representing a pip on the die.(d) edges of localized pips on dice.

Radial Distortion Correction
After detecting the elliptical pips on each die face, they were associated to different planes of the cube based on their geometric properties and spatial relationships.Then, a one-to-one correspondence could be established between the same pip captured in different images based on the relative position of each face.
However, due to the principle of perspective projection, the center of each circular pip did not coincide with the center of its projected ellipse, as illustrated in Figure 3.To find the corresponding points for radial distortion correction, the offset between the projected circle center and the ellipse center was computed using the circular projection model proposed by Matsuoka and Maruyama (2016).To be more specific, let ( ) ,, where a, b, c, d, e, m are the parameters of ellipse (which can be calculated by ellipse fitting with its edge points extracted above), and f stands for the focal length of camera.Since we do not use any prior knowledge of focal length, f is initially set to zero, and then replaced to computed result when it is iterative optimized.
The offset of ( ) x y z is unknown, it can be solved by simultaneously solving a system of equations constructed by Matsuoka and Maruyama (2016).
Figure 3. Circular projection and the center offset on image plane.
The corrected pip centers from multiple die faces provided pointto-point correspondences, which were then used to estimate the radial distortion coefficients following the method of Li and Harley (2005).We implement a one-parameter division model (Fitzgibbon, 2001) to describe the radial distortion: where F represents fundamental matrix of camera.By using the Kronecker product  to rearrange the above equation, we get: We can get Equation ( 8) after finding correspondences, solving them will yield numerous solutions for different values of coefficient k.Hence, a kernel voting scheme was employed to determine the coefficient with the maximum likelihood.For each estimate, a Gaussian kernel was applied to model its contribution to the final coefficient.The kernel bandwidth was set adaptively based on the number and consistency of the point correspondences.The distortion coefficient with the highest voted confidence were selected as result for accurate radial distortion correction.

Camera Intrinsic Parameter Estimation
With the radial distortion coefficients estimated from the previous step, the distortions in the captured images could be corrected.This allowed for the estimation of the camera's intrinsic parameters by exploiting the orthogonal geometric constraints provided by a die.
The key idea to estimate intrinsic parameters was to utilize the tangent information between pairs of ellipses detected on orthogonal dice faces, which can be intersected at vanishing points.As illustrated in Figure 4, under perspective projection, the tangent lines of two ellipses originating from orthogonal circles would intersect at the vanishing point corresponding to the direction perpendicular to both circles.In this paper, we use methods from Da et al. (2012) to compute the tangent lines of two projected circles.By identifying multiple pairs of ellipses with orthogonal relationships, vanishing points in mutually orthogonal directions could be derived.To estimate the vanishing points, we use least-square method to get the intersection of tangent line segments.Based on the principles of perspective projection, the positions of the vanishing points in the image were directly related to the camera's focal length and the principal point location.Specifically, the homogeneous coordinates of the vanishing points could be expressed as functions of the intrinsic parameters (Hartley and Zisserman, 2003).To explain in detail, let's identify the camera matrix in a common type By stacking more than six above functions, we can solve the coefficients in matrix M .Then, camera matrix K can be computed by Cholesky factorization and matrix inversion, and we can get intrinsic parameters from it.
The proposed method leveraged the rich geometric information encoded in the dice faces to establish reliable constraints for intrinsic parameter estimation.By combining the estimates from multiple dice and images, a robust solution could be obtained even in the presence of noise and outliers.

Experiment
To validate the proposed method, experiments were conducted using an Azure Kinect sensor (Figure 5).The Azure Kinect is a high-resolution RGB-D camera developed by Microsoft, which provides synchronized color and depth images.It was chosen as the experimental device due to its obvious radial distortion in raw images and the availability of manufacturer-provided intrinsic parameters, which could serve as reference values for evaluating the accuracy of our estimation.In our experiment, we captured a set of images containing the common-used die from various viewpoints and distances using the Azure Kinect image sensor in 2160p (3840×2160) resolution.The dice were placed on a flat surface, and the camera was moved around to capture images from different angles.To compare the accuracy of our method, we select the wide-used checkerboard calibration method (Zhang, 2000) as a reference.Since we used a common object for calibration, we did not employ a professional calibration checkerboard; instead, we used a calibration checkerboard printed on A4 papers in advance, which is the method commonly used by most people.Moreover, this approach is also quite common in research areas such as SLAM (Simultaneous Localization and Mapping).

Result of radial distortion estimation
We used multiple photos to estimate the radial distortion value, and different combinations of photos can produce different estimates, after performing kernel density estimation on these estimates, we can see that the peak of the distortion estimate is quite pronounced.The corresponding kernel density is 0.3479, which can reflect the radial distortion parameter relatively accurately, as illustrated in Figure 6.

Radial Distortion Parameters
Manufacture's initial value k1=0.479, k2=-2.685, k3=1.609, k4=0.357, k5=-2.505, k6=1.530Zhang's method k1=0.1031,k2=-0.027Our method k=0.0487Table 1.Radial distortion parameters estimated by different methods Due to the use of different distortion models in various calibration methods, we are unable to quantitatively compare our results with theirs, as shown in Table 1.However, as shown in Figure 7, our distortion correction effect is comparable to the checkerboard calibration method provided by Zhang, demonstrating the effectiveness of our approach.

Result of intrinsic parameters
The estimated intrinsic parameters, including focal lengths and coordinate of principal point, were compared with the manufacturer's initial values and those obtained by Zhang's method, which are tabulated in To comprehensively evaluate the performance of our method is estimating intrinsic parameters, we have employed relative error based on manufacturer's initial values, which is computed by the percentage difference relative to the initial values.Specifically, for the estimation of focal lengths fx and fy, our method yields results of 1825.8 and 1796.0 with 1.45% and 0.21% relative errors, respectively, which are closer to the manufacturer's initial values compared to Zhang's method.However, it should be noted that there is still a significant discrepancy for the principal point coordinates between our estimates and the initial values, particularly for the cx coordinate with 7.11% relative error.In terms of overall precision, namely average relative error, we get a very close relative error compare to Zhang's method:2.43%for ours and 2.42% for Zhang's.

Discussion
Based on the data from Table 1 and Table 2, our method has shown a certain level of effectiveness in estimating radial distortion and focal length parameters.As we did not use a high-precision calibration board for calibration, it is likely that Zhang's method absorbed some of the error from feature points during the bundle adjustment, resulting in its calibration of focal length being less ideal than our method.To explain the poorer performance obtained in the calibration of the principal point, we believe this may be due to our estimation of matrix K lacking more constraints, as well as the relatively large residuals of some extracted vanishing points.This suggests that further improvements may be needed to more accurately determine the principal point coordinates, such as providing accurate initial value guess or offering upper and lower limitation boundaries for final result.

Conclusion
In this study, we propose a novel and straightforward calibration method for digital imaging sensors using a low-cost and easily accessible cubical target, a foamy die.By leveraging the unique texture features and geometric properties of dice, our method demonstrates robustness and flexibility in various environments while maintaining user-friendliness.The proposed workflow effectively estimates distortion parameters and intrinsic camera parameters without requiring prior professional knowledge or specific movement patterns.Experiments validate the accuracy and potential of our approach, with less than 1% average difference compared to well-known Zhang's checkerboard calibration method.
In future research, we will focus on enhancing the robustness and applicability of the proposed calibration method by handling more complex scenarios, integrating deep learning techniques for improved accuracy and efficiency, and conducting extensive evaluations and comparisons with state-of-the-art methods to facilitate wider adoption in practical applications.

Figure 1 .
Figure 1.Workflow of our proposed method.

Figure 2
Figure2shows example intermediate results of a single red dice segmentation and white pips detection.By leveraging color information and geometrical constraints, the proposed method robustly extracts the die and pips.

Figure 2 .
Figure 2. Detection progress of pips on a die.(a) original image; (b) red area mask; (c) dice region of interest detection; (d) edges of localized pips on dice.

Figure 4 .
Figure 4. Vanishing point extraction based on geometry features on dice planes.

Figure 6 .
Figure 6.Estimating distortion parameter by kernel voting scheme.