INCORPORATING LAND-USE MAPPING UNCERTAINTY IN REMOTE SENSING BASED CALIBRATION OF LAND-USE CHANGE MODELS

Building urban growth models typically involves a process of historic calibration based on historic time series of land-use maps, usually obtained from satellite imagery. Both the remote sensing data analysis to infer land use and the subsequent mode lling of land-use change are subject to uncertainties, which may have an impact on the accuracy of future land-use predictions. Our research aims to quantify and reduce these uncertainties by means of a particle filter data assimilation approach that incorporates uncertainty in land-use mapping and land-use model parameter assessment into the calibration process. This paper focuses on part of this work, more in particular the modelling of uncertainties associated with the impervious surface cover estimation and urban land-use classification adopted in the land-use mapping approach. Both stages are submitted to a Monte Carlo simulation to assess their relative contribution to and their combined impact on the uncertainty in the derived land-use maps. The approach was applied on the central part of the Flanders region (Belgium), using a time-series of Landsat/SPOT-HRV data covering the years 1987, 1996, 2005 and 2012. Although the most likely land-use map obtained from the simulation is very similar to the original classification, it is shown that the errors related to the impervious surface sub-pixel fraction estimation have a strong impact on the land-use map’s uncertainty. Hence, incorporating uncertainty in the land-use change model calibration through particle filter data assimilation is proposed to address the uncertainty observed in the derived land-use maps and to reduce uncertainty in future land-use predictions. * Corresponding author. This is useful to know for communication with the appropriate person in cases with more than one author.


INTRODUCTION
To understand how changes in urban form are related to urban processes driving these changes, increasing use is made of urban growth models.Most of these models require data on topography, road infrastructure, as well as detailed information on land-use/land-cover change.The latter is usually obtained from visual interpretation of historic time series of aerial photographs or satellite imagery, complemented with ancillary information.Recently, Van de Voorde et al. (2011) presented a method for extracting residential and employment related urban land-use patterns from time series of medium-resolution satellite data through the analysis of the density and the spatial distribution of impervious surface cover within each street block, estimated at sub-pixel level.Based on this method, a framework for calibration of the well-known MOLAND urban growth model was proposed.The approach relies on the comparison of spatial metric values, describing specific characteristics of land-use patterns derived from remote sensing, with corresponding metric values obtained from simulated land-use maps (Van de Voorde et al., 2012).Parameters used in the simulation model are tuned in such a way that the simulated patterns of urban growth, as described by the metrics, match the patterns observed in the remote sensing imagery.One of the difficulties in the proposed approach is the uncertainty that is present in the land-use maps obtained through remote sensing, as well as in the estimation of landuse model parameters, and the impact this uncertainty has on the calibration process.Previous research has demonstrated the importance of investigating and quantifying error propagation when dealing with land-use/land-cover maps.Burnicki et al. (2007) studied the effect of spatial and temporal dependencies in land-cover error patterns on land-cover change maps.In Canters et al. (2002), the input uncertainty in a land-cover map and in a DEM, as well as their combined impact on the outcome of a structural classification of landscape types were assessed.In both studies, the magnitude and the spatial pattern of uncertainty are analysed by producing error-sensitized versions of data input and model outcomes through Monte Carlo simulation.
In this paper, both stages of the remote sensing data processing chain, i.e. the sub-pixel estimation of impervious surface cover for each urban pixel and the multi-layer perceptron (MLP) IKONOS images that overlap the study area were resampled from a 4 m to a 5 m resolution, facilitating a downgrading to the 30 m Landsat resolution later on.Next, the NDVI was calculated and a threshold was set in order to create a binary vegetation map from the IKONOS data.The mediumresolution images were co-registered to these IKONOS images in such a way that the Landsat and SPOT pixels overlap exactly 6 x 6 and 4 x 4 high-resolution cells, respectively.To improve the discrimination between different urban land-use classes, address point data were included in the analysis as well.For Flanders, use was made of the CRAB database; because of the lack of operational data for the Brussels region, Sitex data on the number of floors for each building in the UrbIS large-scale reference map of Brussels (1997)(1998) served as address proxies.Since our interest lies in detecting land-cover patterns, it was decided to define street blocks by intersecting the road network with four urban classes derived by aggregation from the VITO reference land-use map of Flanders of 2010: 1) residential, 2) commercial, industrial and services, 3) recreation and 4) parks.Blocks smaller than 1 ha were merged with a larger neighbour for obtaining units that are appropriately sized to represent the spatial pattern and composition of impervious surface cover within its boundaries.This way, 34 512 regions were delineated.

Spectral mixture analysis of impervious surface cover
In a preliminary step, the known spectral confusion between bare soil and impervious surfaces in the medium-resolution images was addressed by defining a mask to identify all pixels belonging to the urban area.The ISODATA clustering based unsupervised classification method was used to separate four major land-cover classes: urban areas, pure vegetation (trees, crops and pasture), bare soil and water.Next, a separate stepwise linear regression model was developed for each date's urban class pixels with vegetation fraction as the dependent variable (subtracting its values from 1 provides the impervious surface fraction).For the unmixing of the Landsat image, all bands except the thermal infrared band, together with the NDVI, were used as independent variables.For the SPOT images, all three (1996 and 2005) or four (2012) bands and the NDVI served as explanatory variables.Data needed for the calibration and validation of these models were randomly sampled from the part of the Landsat/SPOT images that overlap the IKONOS image.As each sampled pixel covered exactly 36 (6 x 6) or 16 (4 x 4) binary IKONOS pixels, reference vegetation proportions could be easily determined by spatial averaging.A temporal filtering technique based on iterative linear regression between NDVI values obtained from the medium-and high-resolution imagery was applied to remove pixels from the initial training sample that underwent changes in vegetation cover between the acquisition dates of the Landsat/SPOT images and the IKONOS image.Retaining those filtered pixels that coincide with the urban mask, a final random sampling of 3000 training and 3000 validation samples was selected.Based on the validation samples, the mean error, mean absolute error and root mean squared error were calculated to assess the accuracy of the impervious surface proportion estimates.

Land-use classification based on urban morphology
The same set of spatial metrics as defined by Van de Voorde et al. (2011) (the reader is referred to this publication for further details) was used to relate impervious surface cover within a street block to urban morphology: average impervious surface cover, spatial variance and four parameters of the fitted logistic curve of the impervious surface fraction's cumulative frequency distribution within the block.In this research, address density was added to this set as a seventh variable.This variable was calculated by summing the number of mailboxes of all buildings within the block's boundaries and dividing this value by the block's area.Using these variables, a supervised, MLP neural network classification approach was adopted to assign each street block to one of the following land-use classes: 1) residential, 2) employment (commercial areas, industrial zones, public or private services) and 3) green (parks and recreation).Training and validation data were obtained by labelling the street blocks according to the predominant land-use class of the 2010 landuse map of Flanders within their boundaries.Since this reference land-use map is the only ground truth we have at our disposal, a sequential temporal filtering approach based on the blocks' mean spectral values was applied to only select street blocks whose land did not change from 1987 to 2012.Eventually, all blocks that had been temporally filtered in this way were used for training of the classifier for each of the four dates.For validation, the same set of 300 randomly selected samples per class was used for each date.

Uncertainty in the impervious surface estimation:
The uncertainty associated with estimating impervious surface fractions using linear regression unmixing was simulated by adding spatially correlated error fields to the original impervious surface fraction map using the following first-order autoregressive model (Heuvelink, 1998): (1) where I[i,j], ε[i,j] = the simulated, spatially correlated impervious fractional error value and the value from a normally distributed random noise field with mean value zero and standard deviation one, respectively, for the (i,j)th cell q, r = the parameters expressing the spatially dependent and spatially independent component's contribution, respectively In this study, q was estimated to be 0.1125.Eventually, the sub-pixel estimation's uncertainty contribution was evaluated by classifying one hundred perturbations of the original impervious surface map using a separately trained neural network for each perturbation.Proportional to these posterior probabilities, each target class was assigned to a sub-interval in a 0 to 1 range, characteristic of each block.Next, a stochastic model was applied by randomly drawing a number between 0 and 1 from a uniform distribution and assigning each block to the land-use class corresponding with the sub-interval that includes this random number.For blocks receiving the residential class label, an additional distinction was made between low-density (up to 50% impervious surface cover) and medium-to-high-density (more than 50% impervious surface cover) residential.The classification's uncertainty contribution was evaluated by applying the described Bayesian stochastic procedure one hundred times on the same, original impervious surface map.

Combined uncertainty:
Finally, the combined impact of both stages on the land-use map's uncertainty was assessed by subjecting each of the one hundred impervious surface map perturbations to a separately trained MLP classifier and applying the Bayesian stochastic approach to generate one hundred realisations of the land-use map, taking into account the uncertainty in the classification process.

Magnitude and spatial distribution of uncertainty:
The magnitude and spatial distribution of the uncertainty of the land-use maps are obtained by mapping for each block the frequency of occurrence of the modal class.A high frequency of occurrence is associated with a low degree of uncertainty (Canters et al., 2002).To evaluate whether the degree of uncertainty differs significantly between both uncertainty types, as well as between each uncertainty type and their combined impact, the Wilcoxon signed rank test was applied to investigate whether the distribution of the group with positive modal frequency differences is equal to the distribution of the group with negative modal frequency differences.If not, the true location shift between both frequency distributions is significantly different from zero, which indicates a clear difference in the degree of uncertainty.

Robustness to uncertainty:
The robustness to uncertainty of the land-use mapping approach was evaluated by studying the correspondence between the original classification and the most likely land-use map obtained from the Monte Carlo analysis, i.e. the modal class map, by cross-classifying the class labels for all blocks.For each cross-table, the assumption of marginal homogeneity can be tested: if the marginal distribution of the land-use classes remains the same after accounting for uncertainty, the source of uncertainty has no significant impact on the land-use determination.Marginal homogeneity can be tested both globally and specifically for each land-use class.In the first case, marginal homogeneity is rejected if the deviance for the symmetry and quasi-symmetry models fitted to the cross-table is considered to be too large in a Likelihood Ratio chi-squared test (see Agresti, 2002: 429).
Regarding the class-specific evaluation, marginal homogeneity is rejected if zero is not included in the large sample 95% confidence interval for the difference between dependent proportions, which is calculated as follows (Agresti, 2002: 410-411):

Impervious surface mapping
The stepwise linear regression analysis produced only one significant model for each time step for 1987 and 1996, with the NDVI variable as the only significant predictor (p < 0.001).
For 2005 and 2012, the stepwise procedure resulted in four and two different models, respectively.For reasons of parsimony and correspondence with the other time steps, the model with only the NDVI predictor was used for these dates as well.It should be noted that R²adj for these models is relatively low: only 39 to 47% of the variation in vegetation fraction is explained by NDVI.However, despite this poor lack of fit, the mean errors constitute only a negligible to small negative (1987: -0.0113; 1996: -0.0079) or positive (2005: 0.0056; 2012: 0.0011) bias.Also, the mean absolute errors reach values of 16 to 17%, which are quite similar in magnitude to the average absolute errors reported in other impervious fractions sub-pixel estimation research (e.g.Esch et al., 2009;Weng et al., 2008).

Land-use classification
The multi-layer perceptron neural network classification of the original impervious surface cover map resulted in a land-use map depicting the residential, employment and green blocks for each date's urban area (Figure 1a shows the original 2012 classification for the Antwerp area).Mapping all misclassified blocks shows two major types of confusion, both related to residential areas: many blocks located in the city centres and in the urban fringe are incorrectly classified as employment, and low-density detached housing embedded in green surroundings often receives a green class label.Both misclassification types are quite logical: if the classifier comes across an urban block with a strong mix of residential and commercial/services functions that has been labelled as 'residential' in the reference map, the question arises whether 'employment' really is a violation of the truth, as we are in fact dealing with mixed land uses (e.g.Antwerp's eastern half of the city centre, see Figure 1a).With respect to the other source of confusion, it is clear that low-density, suburban housing in a green setting can be morphologically very similar to park and recreation areas (e.g. the villa quarters in Brasschaat in the northeastern corner of Figure 1a).In the same vein, large blocks containing only a school, hospital or few industrial infrastructure in a distinctly green environment are sometimes incorrectly identified as 'parks or recreation'.For these reasons, the class labels were amended to more appropriately represent the blocks belonging to them: 1) low-density residential (LRES), 2) medium-to-high-density residential (MHRES), 3) employment and mixed land uses (EMPL), 4) green and green embedded land uses (GREEN).

Magnitude and spatial distribution of uncertainty:
The modal frequency of each block for each uncertainty type's classification for 2012 is presented in Figure 1d, 1e and 1f.With respect to the procedure that only accounts for uncertainty in the multi-layer perceptron classification, the degree of uncertainty is very low, as illustrated by the high modal frequencies (Figure 1d).Accounting for impervious surface errors, however, provides a markedly different image in Figure 1e: in the crescent-shaped suburban belt around the city of Antwerp, the modal class is most often only selected 40 to 70 times, which seems to be related to the mixed land-use problem discussed in §4.2.Combining the classification uncertainty with the impervious surface estimation uncertainty entails the most profound impact, as illustrated by Figure 1f: the distinction between the lower frequency belt and the high frequency city centre, industrial and green blocks found in Figure 1e is still present, but less pronounced.The Wilcoxon signed rank tests confirm an (extremely) significant difference in modal frequencies between all approaches (p < 0.001, except for 2012, for which p equals 0.013 when comparing the impervious surface estimation with the classification), indicating a notably different contribution to and combined impact on the uncertainty in the derived land-use maps.

Robustness to uncertainty:
Regarding the procedure that only accounts for uncertainty in the classification, a perfect match for all blocks between its modal class map and the original classification was found for each date.As a consequence, the assumption of marginal homogeneity holds, which means in this case that uncertainty in the classification procedure does not affect the determination of the street blocks' land use at all.Considering the impervious surface estimation uncertainty, on the other hand, global marginal homogeneity is very strongly rejected (p < 0.001) for all dates.Changes in land-use class proportions are found to be almost exclusively statistically significant (Table 1).For example, the probability of being classified as low-density residential increases considerably for all dates (between 3 and 4% for 2012, 4 and 5% for 1987 and 1996, and 6 and 7% for 2005) when accounting for impervious surface estimation errors.Nevertheless, the most likely land-use map taking the impervious surface estimation uncertainty into account for 2012 (Figure 1b) shows hardly any visual difference with the original classification (Figure 1a).With respect to the combined impact, global marginal homogeneity is again very strongly rejected (p < 0.001) for all dates.Also, there are no major changes in the difference in proportions as compared to the approach when only impervious surface errors are taken into account, apart from three significance switches for the employment and mixed land uses class (Table 1).The 'most representative' classification for 2012 in Figure 1c, combining both sources of uncertainty, barely differs from the most likely classification obtained after simulation of impervious surface uncertainty.

Reflection on the uncertainty analysis results:
The similarity observed between the most likely land-use maps obtained from the uncertainty analysis as compared with the original classification (Figure 1a-c) seems to be at odds with the statistical analysis indicating significant changes in landuse class proportions when accounting for uncertainty.Regarding the latter, the very large sample of blocks used (34 512) yields very narrow confidence interval widths and therefore relatively small differences in proportion are quite easily considered to be significant.Furthermore, the change of class label mainly applies to (a proportionally small number of) small blocks, which explains the strong spatial correspondence between the most likely land-use maps obtained from the uncertainty analysis and the original classification.The highly stable land-use class assignment when accounting for uncertainty indicates a distinct robustness to uncertainty of the land-use mapping strategy.However, this stability does not necessarily imply a low level of uncertainty in the derived land-use maps.Given the inherent complexity of the urban  Table 1.Difference in proportions relative to the original classification for all land-use classes for the approach only accounting for impervious surface errors (left part) and for the approach combing both sources of error (right part).The 95% confidence intervals are defined between brackets; differences significant at the 0.05 level of significance are indicated with a *.
landscape, there clearly is ambiguity in the land-use mapping process, due to the classification approach used, as well as the mixed occurrence of classes.Regarding the impervious fraction errors, the question remains whether there is still room for subpixel estimation improvement, since most studies reach a similar level of accuracy as obtained in this research.With respect to the classification procedure, multi-layer perceptron neural networks already belong to the more sophisticated classifiers, which makes a reduction of uncertainty in this stage appear less evident.Therefore, accepting the presence of uncertainty in the land-use maps and incorporating it in the land-use change model calibration seems the most logical way to proceed.This will be done in this research by developing a particle filter based data assimilation framework, aiming to improve the reliability of land-use change model predictions.

CONCLUSIONS
Current land-use change model calibration methods do not take into account uncertainties associated with the parameterization of the model and with the land-use data used as a reference.This research focused on the latter part by analysing the impact of uncertainty on land-use maps, obtained through a remote sensing interpretation chain involving sub-pixel estimation of impervious surface cover and classification of spatial metrics describing urban form to infer urban land use, by means of a Monte Carlo simulation approach.It was found that only accounting for classification uncertainty leads to low uncertainties in the land-use mapping results obtained, while by taking into account uncertainty in impervious surface estimation, higher levels of uncertainty occur in zones with a mix of residential and employment land uses, particularly in the city centre and in the urban fringe.The most likely landuse maps obtained from the simulation correspond very strongly with the original land-use classification, which points to a stable land-use mapping strategy, but does not exclude uncertainty in the mapping approach.Given the restrictions with respect to the reduction of uncertainty in the remote sensing interpretation chain, a particle filter data assimilation approach will be developed, allowing us to properly deal with the uncertainty present in the derived land-use maps by incorporating this uncertainty in the land-use change model calibration, thus aiming to improve the reliability of future land-use predictions.

:
To account for the uncertainty associated with the multi-layer perceptron classifier, information on the commission errors from the confusion matrix, obtained by comparing the original classification output with the reference land-use map for Flanders, was combined with local evidence on the identity of a street block in the form of class activation levels, produced by the MLP classifier, using a Bayesian approach.By weighing the global class-based probabilities derived from the confusion matrix using this local evidence, a set of posterior probabilities is obtained for each street block.Translating this in an explicit formula, we get (2) where C = the residential, employment or green class CMLP = the class as determined by the MLP classifier p(C|CMLP) = the a posteriori probability that a street block belongs to class C given its MLP-derived class p(CMLP|C) = the probability that the classifier assigns a block to class CMLP given that the block belongs to class C, as derived from the confusion matrix p(C) = the class membership value of class C produced by the MLP's output node activation transfer function, considered as the a priori probability of class C = the marginal proportions for the blocks labelled as the class under consideration in the modal class map resulting from the uncertainty analysis and in the original classification, respectively p12, p21 = the proportion of blocks labelled as the class under consideration in the original classification, but not in the modal class map, and vice versa, respectively α = the level of significance (i.e.0.05)

Figure 1 .
Figure 1.At the top: detail of Antwerp of the original land-use classification map (a) and of the modal class map when only accounting for the impervious surface errors (b) and when combining both uncertainty sources (c).At the bottom: the modal frequency maps of the classification errors only approach (d), of the impervious surface errors only approach (e) and of their combined impact (f).
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-2/W1, 2013 8th International Symposium on Spatial Data Quality , 30 May -1 June 2013, Hong Kong classification approach to infer urban land use from urban form, are submitted to a Monte Carlo simulation to evaluate their impact on uncertainty in the derived land-use maps.The robustness to uncertainty of the proposed land-use mapping strategy is assessed through a comparison of the most likely land-use map obtained from the simulation with the original classification.The ultimate goal of the research is to reduce International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-2/W1, 2013 8th International Symposium on Spatial Data Quality , 30 May -1 June 2013, Hong Kong