ASSESSING MODIFIABLE AREAL UNIT PROBLEM IN THE ANALYSIS OF DEFORESTATION DRIVERS USING REMOTE SENSING AND CENSUS DATA

In order to identify drivers of land use / land cover change (LUCC), the rate of change is often compared with environmental and socio-economic variables such as slope, soil suitability or population density. Socio-economic information is obtained from census data which are collected for individual households but are commonly presented in aggregate on the basis of geographical units as municipalities. However, a common problem, known as the modifiable areal unit problem (MAUP), is that the results of statistical analysis are not independent of the scale and the spatial configuration of the units used to aggregate the information. In this article, we evaluate how strong MAUP effects are for a study on the deforestation drivers in Mexico at municipality level. This was done by taking socio-economic variables from the 2010 Census of Mexico along with environmental variables and the rate of deforestation. As population census is given for each human settlement and environmental variables are obtained from high resolution spatial database, it was possible to aggregate the information using spatial units (”pseudo municipalities”) with different sizes in order to observe the effect of scale and aggregation on the values of bivariate correlations (Pearsons r) between pairs of variables. We found that MAUP produces variations in the results, and we observed some variable pairs and some configurations of the spatial units where the effect was substantial.


INTRODUCTION
Land use/cover change (LUCC) is significant to a large range of aspects related to global environmental change and has received increasing attention from scientists and decision makers.Over the last decades, a broad range of studies have been carried out to monitor, evaluate and project LUCC with a particular emphasis on deforestation.Many studies of LUCC are based on remote sensing and census data using spatial analysis approaches.Multidate images are classified in order to monitor LUCC and spatial variables, expected to be the drivers of changes, are integrated in a GIS database.Then, the rate of change (e.g.rate of deforestation) is often compared with environmental and socio-economic variables such as slope, soil suitability or population density in order to identify and assess the effects of the drivers by means of a statistical index.Socio-economic information is obtained from census data which are collected for individual households but are commonly presented in aggregates on the basis of geographical units such as counties, municipalities or states.A common problem is that the results of statistical analysis are dependent of the scale and the spatial configuration of the units used to aggregate the information.According to Openshaw (1984) , this problem, known as the modifiable areal unit problem (MAUP), has two components: the scale problem and the aggregation (or zoning) problem.The scale problem is the variation in results observed when the data are aggregated into sets of increasingly larger units of analysis.The zoning problem is related to the variations in results observed when the analysis is done using the same number of alternative units.Some works indicate that the MAUP can cause variations of the correlations from -1 to +1 by judicious placement of zone boundaries (Openshaw, 1984;Openshaw and Rao, 1995).However, Flowerdew (2011) used a large data set from the English Census and did not found large differences between correlations at different scales in the majority of the cases.In Mexico, most of census information is available at municipal level.In 2010, there were 2456 municipalities, which area ranges from a few km 2 to more than 53,000 km 2 with an average area of 796 km 2 .The objective of this study is to evaluate how strong MAUP effects are on the assessment of deforestation drivers in Mexico using municipality-based data.

MATERIALS AND METHODS
We used socio-economic variables from the 2010 Census of Mexico from the National Institute of Statistics and Geography IN-EGI (2010) at human settlement level along with the marginalisation index calculated by the National Commission of Population CONAPO (2010) using information of housing, schooling and incomes from INEGI.We used also topographic indices (slope and elevation) obtained from the Shuttle Radar Topography Mission digital elevation model (http://www2.jpl.nasa.gov/srtm/)and the forest tree cover and forest loss data from the Global Forest Change database (http://earthenginepartners.appspot.com/science-2013-global-forest;Hansen et al., 2013).Table 1 shows the source and the resolution of the variables used in the study.All spatial and statistical analysis were carried out using the open source program R (R Core Team, 2013;Hijmans, 2015).
Study area encompasses about 111,360 km 2 located in the central part of Mexico.Based on the municipal average area, expected number of municipalities for this area is 140.To test the zoning effect of MAUP we generated Thiessen polygons around 140 random points.Each Thiessen polygon was used as an analysis unit ("pseudo municipality").We computed, for each unit, the average elevation, average slope, population density, proportion of illiterate population, proportion of houses with dirt floors and the rate of deforestation, computed as the proportion of forest (tree cover > 10%) which presents loss during 2000-2012.As a following step, we calculated the bivariate correlations (Pearson's r) between pairs of variables.In order to evaluate the zoning effect, this experiment was repeated 20 times in order to assess the variations of the values of correlation depending on the configuration of the units.In order to evaluate the scale effect, the number of polygons of Thiessen varied from a 1/4 th , 1/2 th , twice and four times the expected number of municipalities taking into account the average municipal area in Mexico.The variation of Pearson correlation values depending on zoning and scale effects was assessed by means of the coefficient of variation.

RESULTS
Figure 1 shows the first configuration of the 140 spatial units above the digital elevation model.In table 2, which shows the variation of the correlation index depending on the zoning effect, it can be observed that the coefficient of variation ranges between 3 and 600% depending on the pair of involved variables.However, high values of the coefficient of variation correspond to weak correlation: When the coefficient of Pearson is superior to 0.5, the coefficient of variation is below 10%.It can also be observed that the minimum and maximum values of the coefficient are often different from the mean importantly.These differences mean that some specific configurations of the aggregation units can conduce to very contrasting results in the statistical analysis.Figures 2 and 3 are box-plots which show the variation of the Pearson coefficient values between the rate of deforestation and the slope and the index of marginalisation respectively.In the case of slope, we used the absolute value of the coefficient of correlation to make the interpretation of the graph easier.The variation of the value of correlation is due to the change in the number of units (scale effect).As Fotheringham and Wong (1991) noticed the correlation coefficient increases when the analysis is based in larger units due to a smoothing effect by averaging, so that the variation of a variable tends to decrease as the aggregation increases.In the box-plot, it can also be observed outlier values of correlation which correspond to particular configuration of the units which produce extreme values of correlation.The results obtained using 35, 70, 280 and 560 spatial units are presented in the appendix.

DISCUSSION AND CONCLUSION
In this study, we observed the smoothing effect related with scale.
For simple statistical analyses as correlation analysis and linear regression, such variations can be theoretically expected and therefore are relatively well understood (Fotheringham and Wong, 1991;Jelinski and Wu, 1996).At the contrary, the zoning problem is more complex and much less well understood, even for simple statistical analyses.In the present study, we observed unpredictable results related with some specific configuration of the units used to compute the indices.Figure 4 shows four different spatial configurations of the same number of aggregation units ("pseudo municipalities") and can help understanding this behaviour.In the two top figures (a y b), the cluster of deforestation patches belongs to one single unit, a large one for the upper left figure (a), a small one for the upper right one (b), leading to moderate and high rate de deforestation for the corresponding unit.In the two figures at the bottom (c y d), the cluster of deforestation is distributed among three and four aggregation units leading to even rates of deforestation.Some studies reported that correlations varied from -1 to 1 due to the MAUP effect (Openshaw and Rao, 1995).However, these correlations were obtained using highly convoluted and there- fore implausible boundaries between units.In the present study, boundaries are more simple than true boundaries due to the use of Thiessen polygons.However, as the centroid of each unit is a random point, pseudo municipalities are not realistic.For instance, some units can encompass little or, in some cases, no population at all.In future research, we will examine the effect a such unrealistic feature on the design of units choosing randomly existing settlements with a minimum population as municipality seat (administrative center) and centroid of a spatial unit of analysis.We found that, in most of the cases, MAUP does not make large difference to the results as reported in some previous studies.However, we observed some variable pairs and some specific Figure 4: Four configurations of aggregation units ("pseudo municipalities") with the same number of units (zoning effect) configurations where the effect was substantial.In future research, we will assess the effect of MAUP in global and local models of regression and evaluate the potential solutions reviewed by Dark and Bram (2007).

Figure 1 :
Figure 1: Limits (red) of the 140 random spatial units ("pseudo municipalities") above the digital elevation model (grey scale)

The
Figure 2: Variation of the Pearson coefficient between the deforestation rate and the slope due to change in the number of units

Table 1 :
Input variables characteristics

Table 2 :
Minimum, maximum, mean, standard deviation and coefficient of variation of the values of the Pearson coefficient of correlation with 140 units (zoning effect)

Table 4 :
Openshaw, S., 1984.The modifiable areal unit problem.Geo-Books, Norwich, England.Minimum, maximum, mean, standard deviation and coefficient of variation of the values of the Pearson coefficient of correlation with 70 units (zoning effect)

Table 5 :
Minimum, maximum, mean, standard deviation and coefficient of variation of the values of the Pearson coefficient of correlation with 280 units (zoning effect)

Table 6 :
Minimum, maximum, mean, standard deviation and coefficient of variation of the values of the Pearson coefficient of correlation with 560 units (zoning effect)