A non-parameterized method for co-registration of panchromatic and multispectral images

Precise co-registration of panchromatic and multispectral images can be challenging due to the imperfect alignment of different sensors from the same platform or the involvement of different platforms. However, conventional methods heavily depending on the quality of feature matching and parameterized model fitting fail to yield an accurate result if the relative deformation between images is large. We propose a non-parameterized method that is free of such problems. A basic functional model is established with the consideration of equal radiance and smooth regularization. The local radiance deformation and the self-tuning weighting are then introduced to make the model more suitable for the specific requirement of co-registration. The model is finally solved with a twostage coarse-to-fine optimization approach. Our experiment on ZY-3 (China) images demonstrates its superiority over conventional methods, especially when large deformation due to terrain relief and sensor mis-alignment exists.


INTRODUCTION
The remote sensing technique usually collects earth observation images from air-or space-borne sensors.To meet specific practical requirements, these sensors are designed to work at quite different spatial and spectral resolutions.However, the greedy demand for high resolutions of both types is unable to be considered in a single image due to the limitation in radiance energy and space-to-earth data transmission (Wöhler and Hafezi, 2005).Usually high (spatial) resolution images are available only in panchromatic band while multispectral images suffer from low spatial resolutions.Thus studies in image fusion become necessary to cope with this discrepancy by combining two types of images into a high resolution multispectral image.
Image fusion generally involves two steps (Chavez et al., 1991): geometric co-registration and spatial-spectral content mixing.Existing studies mainly focus on the latter to develop methods simultaneously preserving spectral information and representing spatial details.Great progresses have been achieved through the last decades.Considering the co-registration, however, a few studies are published.Most of platforms simultaneously collecting panchromatic and multispectral images (e.g., Leica ADS 70，Zeiss/Intergraph DMC, and Microsoft UltraCam for airborne sensors or Quickbird, Ikonos and GeoEye for satellite sensors) are benefited from the precise alignments of sensors or a careful geo-registration preprocess.The relative deformation of two images is thus reduced to an insignificant level for the subsequent mixing process.However, in some other situations the issue might not be so optimistic.Some multi-linear array satellites such as SPOT-5 and ZY-3 (China) provide several panchromatic images from very different viewing angles for stereo vision.It would potentially be valuable to fuse these images with the multispectral one in terms of information enhancement.Therefore, their co-registration is very necessary.In some other cases like the IRS-P5 satellite, without available multispectral observation onboard, the fusion can only be successfully achieved with external images from, for example, the IRS-P6 satellite (Yakhdani and Azizi, 2010).It is another interesting case of multi-source fusion which is heavily affected by the differences in geometric orientation, spectral responses, and time.In these cases, the co-registration is not part of the standard process on the image provider side, so it should be carried out by the user.This study tries to give a practical solution to this problem.The discussion is confined to satellite images without significant temporal difference here.
Certainly the geo-registration is still applicable, but without DEM or precise orientation parameters it becomes an extremely difficult mission.Researchers alternatively turn to developing feature-based co-registration methods.The widely adopted framework can be depicted similarly with that in (Blanc and Wald, 1998): (1) TP (tie point, or feature point) extraction in the reference image (often the high resolution image); (2) Matching TPs in the second image (often the multispectral image) with careful quality control; (3) Estimate the relative deformation between images based on matched TPs.
In some works the first two steps have been extended to line matching (Eugenio and Marqué s, 2003) or region matching (Flusser and Suk, 1994).Nevertheless, it is regarded as a standard framework for image co-registration.One may notice that the accurate deformation estimation is only possible in regions characterized with distinct features (e.g.corner points, contours, etc.).Other regions or pixels must be predicted in order to achieve the usually demanded pixel-level fusion (Pohl and Van Genderen, 1998).As a common practice the interpolation is popularly adopted.Besides, the model fitting approach is also applicable if the deformation model is prior known to be a parameterized function.
We define the above-mentioned approaches uniformly as the ones based on parameterized model.The fitting approach is quite direct that it equates the deformation determination to the parameter estimation of functions (e.g.affine function, polynomial, etc.).On the other hand the interpolation approach is also parameter-based.Taking the bilinear interpolation for instance, it solves the three parameters of a plane and predicts inside deformations for each triangle patch constructed by adjacent TPs.Once a parameterized model is adopted, its correctness, as well as the density and accuracy of TP matching, becomes the vital factor tightly associated to the accuracy of coregistration.A typical example is illustrated in Figure .1.An arbitrary deformation leads to an erroneous co-registration if the parameterized method is applied.The above deformation was derived from an arbitrary function that cannot be represented by a simple parameterized model.In order to pursue a good result the matched points should be both dense and accurate to apply parameterized methods locally.However, the large deformation leads to an erroneous matching result.Though the automatic removal of anomalies is applied, there still remain some false matches due to the ambiguity along the road.Moreover, the density of matched points is significantly reduced.This is the critical limitation of the parameterized method.
This paper proposes a non-parameterized co-registration method to cope with the annoying trouble faced by the conventional parameterized method.Differing from the common scheme (Zitova and Flusser, 2003), it needs neither feature-based correspondence nor parameterized deformation model and provides accurate pixel-wise geometric deformation estimation.
With self-tuning edge-enhancing weighting and regularization constraint, the method is able to achieve rigorous matching for featured regions while produces smooth estimation for others.We also introduce additional radiance difference estimation which effectively avoids the false correspondence caused by systematic radiance difference through various sensors.The experiment on the ZY-3 images is then conducted to evaluate this method.

NON-PARAMETERIZED CO-REGISTRATION
The concept of non-parameterized co-registration is proposed to discriminate from the conventional parameterized one.In the case of remote sensing images, the relative deformation basically comes from the joint effect of un-aligned orientations and the terrain relief.This may be a very complex model difficult for parameterized methods in some situations (e.g. Figure 1).By jumping out of this conventional parameterized framework, we propose another solution with a quite straightforward thinking.Since the pixel-level co-registration is required, the deformation should be estimated pixel by pixel.In the aspect of solved variables, the corresponding model involves no parameterized function but a field of vectors implying the deformation of each pixel.The deformation can be directly estimated with this model without post process such as interpolation or model fitting.

The basic functional model
Initially we define the pixel-wise deformation as u and v in the horizontal and vertical directions respectively.The correspondence between two images is then expressed as: where P is the radiance value of the panchromatic image, M is the multiband radiance values of the multispectral image, and Θ is the transformation from the multispectral radiance to the panchromatic radiance.Readers can refer to (Wang et al., 2005) to find a physically rigorous form of Θ.Under the assumption of precise radiance transformation and noise free, the equation conveys a simple idea that the radiance stays unchanged after deformation which is an important cue for deformation estimation.It shares common points with feature point matching but replaces the feature similarity with the radiance equality, making variational solution possible as we will see later.
Unfortunately this single equation is not robust at all comparing to feature matching.We introduce an additional regularization constraint to relieve the problem.Then a functional model is obtained: where Ω is the space of image plane, and α and β are the weight values for the two terms respectively.Herein we use a function formation embedded in W 1,2 space (Aubert and Kornprobst, 2006) instead of the original matrix for bi-directional deformation to derive the functional model.The regularization constraint is provided by the penalty of large gradients in u and v with a L 2 -norm.In other words, the deformation should be largely smooth without significant discontinuities.It can effectively exclude mismatches which usually break the smoothness constraint.So the aim of ( 2) is to find the optimal estimation of deformation which makes a perfect balance between radiance equality and deformation smoothness.The weights α and β are designed to distribute the relative importance of these two considerations.
In fact (2) is equivalent to the old model proposed in (Zhang, 2004).It is to impose isotropic smoothness constraint in the solved deformation field with a constant β.In addition, the strong L 2 -norm constraint on the radiance equality may be problematic since it is not robust enough to reject the noise.The solution space of (2) i.e. the W 1,2 space does not allow for any discontinuity e.g. the boundary of buildings.A lot of studies focus on these problems and propose many practical and advanced methods like the anisotropic regularization, the total variation regularization (Chambolle, 2004) and robust penalty (Werlberger et al., 2009).Though their strong capability is proved, they are often not economic to solve the practical remote sensing image co-registration problem considering their large expense for computer memory and processing time.In our current study, we do not consider urban areas, so the discontinuity is not involved here.According to our experiments, the radiance noise seems not to be very critical in co-registration of modern satellite images.Instead the systematic radiance difference between different sensors is of more importance.Meanwhile the isotropic smoothness constraint is not very appropriate indeed.We will provide corresponding solutions for the two challenges as follows.
Readers may notice that model ( 2) is identical to the optical flow model in computer vision studies.Indeed we borrow the idea from the latter.Researchers form computer vision areas are mostly interested in the movement of the foreground object from a video sequence.The discontinuity between foreground and background objects should be paid more attention while the pixel-to-pixel correspondence inside a continuous region is less challenging due to the small deformation through video images.
However, for remote sensing image co-registration the discontinuity seldom occurs except in urban areas.As a result, the main consideration is paid to the precise pixel-to-pixel correspondence under a potential large deformation.The similar idea is also found in previous studies (Fonseca et al., 1999;Liu and Yan, 2008) but these works only used the optical flow to assist the accurate matching of feature points rather than estimate pixel-wise deformation directly.

Radiance difference estimation
The inevitable challenge in image fusion comes from the incompatibility of spectral response in different sensors.The working spectrums of panchromatic and multispectral sensors are often not identical, especially when different satellites are considered.Even if the spectrums have a perfect overlap, the respective contribution to the panchromatic response from each multispectral portion, which is presented by Θ in (1), remains unknown if the image provider does not publish that.Through the digitalization of the spectral response, the difference in gain and offset parameters brings in extra difference in the final image.
All these factors cause a systematic radiance difference between the processed two images (Figure 2).The strong L 2 -norm constraint on the radiance equality is not robust to cope with this problem.So we introduce additional variables to explain for the radiance difference: where r 0 and r 1 are the pixel-wise additive and multiplicative factors respectively, and η 0 and η 1 are their corresponding regularization weights.
In this extended model, we assume that the transformed radiance has a locally linear deformation relative to true panchromatic radiance.The assumption introduces additional variables with double number of pixels, which depicts the additive and multiplicative effects at each pixel.Such a large size of unknown variables has made the model very unstable, so the regularization is as well imposed on the new variables.The regularization is quite natural since patterns of radiance difference in neighboring pixels are close to each other; otherwise difference is not systematic but random, contrary to the above discussion.We use an isotropic regularization on the new variables because anisotropic version produces little improvement and increases the complexity in our experiments.
The first term in (3) implies an interesting interrelation between geometry-relevant (u and v) and radiance-relevant variables (r 0 and r 1 ).The optimal solution should have a compatible estimation of both types of variables and meanwhile meet the regularization conditions.According to our experiments, their simultaneous estimation becomes quite unstable due to the large magnitude difference.We prefer a twostage strategy as a substitute.In the first stage, the geometryrelevant variables are fixed to solve the radiance-relevant variables.Then the condition reverses to solve the optimal geometric deformation.These steps are iteratively operated on the image until the global optimum is reached.With the pyramid strategy and carefully selected down-sampling ratio we find that one two-stage step for each level is quite enough to obtain a satisfactory result.
Figure 2. The panchromatic (left) and transformed multispectral (right) images of a same region.Both images are from the ZY-3 satellite where the nadir panchromatic sensor and multispectral sensor almost cover the identical spectrum.We transform the multispectral image with the average operation, but there still remains apparent systematic radiance difference.

Edge-enhancing weighting
From the perspective of differential geometry, the L2-norm regularization constraint imposed on the gradient of u and v equivalently minimizes the curvature of these two variables.Without other constraints (such as the radiance equality), the final solution is known as a rigorous plane, which is definitely not what we want.The radiance equality controls the smoothing tendency driven by the curvature minimization principle.If the radiance quality constraint is allocated with a larger weight, the regularization force on the corresponding pixel is accordingly reduced.It leads to an analogous solution with the anisotropic regularization constraint.So we attempt to cope with the problematic isotropic regularization by self-tuning radiance equality weight in this study.
In the image fusion application, some parts of the image which include distinct features should be forced to strictly match their counterparts in the other image.It is understandable since the aim of image fusion is sometimes stated as the integration of different features from different images.The least requirement is naturally to be met that the re-occurred features should be precisely matched to each other.For the regions revealing distinct features we will impose a stronger weight α than other regions.The curvature of deformation in these regions is allowed to be large, that is, the smooth constraint is secondary to be considered.
Yet the distinctness of features is an open problem.In this paper, we relate it to the edge strength which is calculated as the magnitude of Gaussian-filtered gradients.This definition for edge strength is able to capture intensity discontinuity while suppress unexpected noise and thus is widely used in many studies e.g. the famous Canny operator.Once a strong response of edge is detected, an additional edge-enhancing weight is added to the initial weight.The final expression to calculate the weight of the pixel-wise data term is given as: where α 0 is the initial weight for the data constraints, α e is the maximum value for edge-enhancing weight, T is the threshold to truncate large edge strength, γ is a power coefficient, and G* means the Gaussian convolution.In practice most of the edges we are interested in always associate to a big edge strength value but smaller than the largest one.The value can be estimated statistically with robust estimators like the 80%quantile.We use it as the threshold T to avoid over-suppression on interested edges caused by too large edge strength maximum.The γ is designed to achieve a non-linear dependence of the edge-enhancing on the edge strength.It is obvious that a large γ reduces the edge-enhancing weight on the pixel with small edge strength, i.e. featureless region, effectively avoiding the abuse of the new weight.Since ( 4) is calculated based on the panchromatic image, it only needs to be evaluated once at each pyramid level without updating through iterations.

The Euler-Lagrange equation and solving process
The classical solution of the typical variational problem (3) is given by the Euler-Lagrange equation.Since we adopt a twostage strategy to solve geometry-relevant and radiance-relevant variables separately, the EL equation should be accordingly derived for each stage respectively.We have: (5) for the first stage; and: for the second stage.The arguments x and y are left out here.Readers should be aware that these equations are derived for each pixel on the panchromatic image.And the value of M is predicted by the current geometric deformation variable attached to each pixel.
Replacing the Laplacian operators with their discrete form cheerfully leads to linear equations about the solved variables for most terms in ( 5) and ( 6).Terms relevant to transformed multispectral radiance in (6), however, is not linear due to the nonlinearity and nonconvexity of the image function.The problem is critical since it misleads the solution to converge to a suboptimum.In our work the image pyramid is pre-calculated to perform a coarse-to-fine optimization.This is a widely-used and practical strategy to avoid suboptimum convergence.The final solution process of our method is shown in Table 1.

1
Up-sample the multispectral image to a similar resolution with the panchromatic one and do the radiance transformation.

2
Build pyramids for the panchromatic image and transformed multispectral image respectively.

3
Starting from the coarsest level, repeatedly solve ( 5) and ( 6) at each pyramid level.

4
If the finest level is reached, output the deformation estimation u and v. Otherwise, interpolate u, v, r 0 and r 1 to the finer level.

5
Warp the multispectral image based on the estimated deformation to perform the pixel-level fusion process.
Table 1.The solution scheme of our method

EXPERIMENTS AND ANALYSIS
In this section we present results of co-registration with the ZY-3 images.The main part of the experiment is conducted on the simultaneously captured images of the same satellite.ZY-3 is equipped with the three-linear CCD camera similar to SPOT-5.The onboard 5.8m VNIR sensor is aligned to the 2.1m nadir panchromatic sensor in order to minimize the relative deformation between images collected by them.However, the alignment is not precise enough so the deformation should be estimated by co-registration.Moreover, we attempt to integrate the multispectral information into the forward and backward panchromatic sensors (both with deviation of 22 degrees with respect to the nadir one at 3.6m resolution), an effort to consider even larger deformation.
The experimental site belongs to the Taihang Mountains, located near the boundary of Shanxi and Hebei province, China.We select the image (four sensors are all available) captured on May 10 2013 to evaluate the performance of our method.The topography is typically mountainous with elevations changing from 900m to 1400m.Two co-registration experiments are conducted on the nadir-multispectral (NAD-MS) pairs and forward-multispectral (FWD-MS) pairs independently.The RMSE assessment of the predicted deformation is presented in Table 2.The RMSE (in pixel) of deformation predicted by parameterized (P.) and non-parameterized (Non-P.)methods on the two co-registration experiments.In this case of ZY-3 images, u is deformation across the flight direction while v is deformation along the flight direction.The circular accuracy of the vector (u, v) is implied by RMSEtotal.
The result of our non-parameterized method is produced with an experimentally designed setting: α 0 = 800, α e = 1,500, γ = 2, β = 1, η 0 = 6 and η 1 = 6.We also use the ENVI 4.8 software to produce the result of the compared conventional parameterized method.Specifically feature points are extracted with the Moravec operator with average spacing of 30 pixels.Given an initial guess of deformation, the matching process is then achieved with 11*11 patches to calculate normalized correlation coefficients (NCC) and 31*31 windows to search for potential correspondences.False correspondences are inevitable after this primary process, so we filter out matches with NCC below 0.8 and further remove anomalies by local plane fitting with a tolerance of 3 pixels.The final pixel-wise deformation estimation is achieved by triangular linear interpolation.
When the co-registration is between the nadir and multispectral images, two methods have similar accuracy.The small relative distortion has been effectively guaranteed by sensor alignment, so the accurate correspondence becomes easy for both methods.
Our method has a little advantage over the other since it is a true pixel-to-pixel method without any interpolation.The FWD-MS case, however, is more challenging with orientation deviation as large as 22 degrees.The deformation along the flight direction is not systematic at all but dependent on the terrain relief while the deformation across the flight direction is mainly linear caused by the resolution difference of two sensors.The result in Table 2 illustrates the failure of conventional method in dealing with the large distortion in the along-flight direction caused by mountainous topography.Contrarily, our method still can control RMSE v in the same level with that in the previous NAD-MS experiment.Some details of the rectified images based on the estimated deformation are displayed in Figure 3, offering a direct impression of the rectified result.Though the parameterized method accurately matches features when the distortion is small, it performs quite poorly with some false matching of roads and mountain ridges.It is a joint result of sparse matched points and inappropriate deformation model (linear triangles).In fact we have initially extracted 2,500 feature points and only 879 of them are retained after filtering anomalies.With a loose threshold the number could certainly be increased but meanwhile brings the problem of plentiful false matches.The non-parameterized method is free of such dilemma for the reason that it conducts the true pixel-to-pixel correspondence with no need to remove anomalies afterwards.The final deformation estimated by two methods is visualized in Figure 4. We find that the non-parameterized method successfully captures the detail deformation in contrary to the conventional method.Furthermore if the accuracy orientation parameters are known, the deformation reduces to one direction and can be estimated more accurately -that is the main focus of stereo matching studies.There is also the requirement for co-registration of images from either different satellites or different flights of the same satellite.
In either situation images are captured from different positions and orientations, so images suffer from non-systematic deformations in dual directions, which raise the difficulty of coregistration.We conduct the co-registration experiment on the backward panchromatic image from the previous experiment and a multispectral image from another flight 10 days later.Some details of the result are displayed in Figure 5.
The result is consistent with the previous experiment of simultaneously captured images.It suggests that our method is also applicable and advantageous when dual-directional nonparameterized deformations occur.It should be noted that the non-systematic deformation in the across-flight direction does not get as large as that in the orthogonal direction due to the narrow view field of linear sensors and the parallel relationship of different flights.A more challenging situation might be encountered if the co-registration is required for arbitrarily different satellites where the two favorable conditions are probably broken.

SUMMARY AND CONCLUSION
We have introduced a pixel-wise co-registration method that does not require for any parameterized deformation model for co-registration of panchromatic and multispectral images.The new method is originally derived based on pixel-wise radiance equality and additional regularization constraints.The basic functional model is then extended with radiance difference estimation and edge-enhanced weighting.According to experiments on the ZY-3 images, the new method is able to recover smooth and detailed deformation caused by the joint effect of terrain relief and orientation deviation of sensors.It is superior to the conventional parameterized method especially when the deformation gets large.Our method would be beneficial to register multispectral images to high resolution panchromatic images captured from a totally different orientation.Future studies may focus on the co-registration between different satellites.The incorporation of matched feature points is also attractive since it would be helpful to accelerate our method and avoid false convergence.

Figure 1 .
Figure 1.The original panchromatic ortho-image (left) is warped arbitrarily to generate the middle image.Then the conventional feature-based method is applied to predict the deformation.The result is shown to the right, where bars alternately display the original image and the rectified image based on the estimated deformation.

Figure 3 .
Figure 3.The left column is the rectified multispectral image overlapping on the forward panchromatic image produced by the parameterized method while the right column is produced by the non-parameterized one proposed in this study.

Figure 4 .
Figure 4. Deformation in across-flight direction (top row) and along-flight direction (bottom row).The left column is produced by the parameterized method while the right column is from the non-parameterized one proposed in this study.

Figure 5 .
Figure 5.The co-registration result of ZY-images from different flights.The left column is produced by the parameterized method while the right column is produced by our nonparameterized method.