EXPLORING SCALE EFFECT USING GEOGRAPHICALLY WEIGHTED REGRESSION ON MASS DATASET OF URBAN ROBBERY

Urban geographers have been studying to explain factors influencing crime on cases limited by their study areas. Researchers have a common opinion that explanatory variables modelling crime on those cases might be irrelevant for another one. None of the researchers tested significance of these variables with changing scales of the study area. Because their data did not allow them to study with different scales. This research examines the scale effect with various data from a wide range of data sources. Geographically Weighted Regression (GWR) method is used to explain that effect, after organizing data by Geographical Information System (GIS) technologies. Explanatory variables deduced for district scale are different from those for grid scale. Hence, the explanatory variables may change not only for different geographical areas but also for different scales of the same area. * Corresponding author.


INTRODUCTION
An increasing number of big cities cause social, cultural and economic interactions between inhabitants.People may interact with each other in negative way and crime is one of them.Various crime types like robbery, assault, etc. rise with the increase of population and interactions (Ackerman, 1998;General Police Department of Anti-Smuggling and Organized Crime, 2012).Much research focuses on explaining the quantitative increase of crime by using data in order to provide more secure cities. Mass dataset from a wide range of data sources calls for managing various data formats.Hence, mass data can help explain crime by identifying crime patterns.
Because "place" plays a vital role in understanding crime (Chainey & Ratcliffe, 2005), earlier studies focused on visual presentation of crime to look for patterns in the crime data, namely crime mapping.Crime events are not distributed randomly in space (Brantingham & Brantingham, 1997;Alpdemir & Çabuk, 2005), various factors can influence crime occurrences.Therefore just a crime mapping cannot be enough to provide crime prevention, identification and mapping of the factors is essential as well.Hence, urban geographers studied the ecology of crime that is, the relationship between crime, the built environment, land use etc. (Olligschlaeger, 1997).
These socioeconomic variables change slowly over time and cannot explain short-term variations in crime rates.However daily and hourly fluctuations in crime are far more variable than year to year shifts (Cohn, 1990).In this manner spatiotemporal variables are investigated that might also lead to an increasing number of crime.The studies, conducted in 1980s, found significant linear relationships between ambient temperature and crime (Anderson & Anderson, 1984;Cohn, 1990;Field, 1992;Salleh et al., 2012).According to Cohn (1990) patterns in the crime rate are also linked to temporal and ecological factors like surface temperature, sunlight, rain and wind because these factors induce stress and discomfort among inhabitants which are related to crime occurances (Salleh et al., 2012).
Regardless, the ecology of crime studies begin with paper maps.However, population growth increases crime rate so that only paper maps do not suffice to explain crime patterns.Wide range of data and a data management system are essential to identify the factors influencing crime patterns.Geographical Information System (GIS) technologies enable to store, manage and visualize both spatial and non-spatial data (Longley et al., 2001;Tecim, 2008) to extract information from mass data (Thurston, 2002).GIS has capability to associate many forms of information derived from various data sources with specific locations (Taylor & Blewitt, 2006) and provide continuous interaction with data to users.Therefore, GIS technologies have been used in ecology of crime studies.
Researchers used data to explain crime on cases limited by their study areas, so inferences are only limited to those areas.This limitation about explaining the crime events of spatial study area is an important consequence of the Modifiable Areal Unit Problem (MAUP).The use of different areal units may have influence on the results of ecology of crime studies (Openshaw, 1984).Although studies explain crime by several variables, these explanatory variables are not identical for each case and they are variable from country to country or even from region to region in a country.In addition, to our knowledge no study attempts to make inferences on same study area by changing the scale and thus changing the significance of explanatory variables.
In this research, we investigate whether the same set of variables explain the robbery events for two different scales.Socioeconomic, spatiotemporal and geographical explanatory variables are identified at district and grid scales of the study area by using Geographically Weighted Regression (GWR).GWR allows different relationships to exist at different points in space by calibrating a multiple regression model and investigate local trends in nonstationarity in regression models (Brunsdon et al., 1996).Hence, this research performed on Izmir that is world's one of the urban agglomerations (UN, 2011), to determine if the explanatory variables changes with changing scale of the study area by using socioeconomic, spatiotemporal and geographical variables.

METHODOLOGY
Various types of data in the urban area of Izmir are obtained from different data sources covering the full month of August 2010.Auguts is the mostly reported season of robberies, which also applies for our case in this study (Salleh et al., 2012).The methodology comprises organization of acquired data and design of organized data to make it appropriate for using both district and grid scale of the study area.GIS technologies are used to organize, store and visualize data with the help of integrating information from a variety of sources into one user interface (Olligschlaeger, 1997).All existing data were transformed to a spatial database in order to carry out spatial query and analysis.The arrows in Figure 1 are labelled with the methods used.
where Z i = the temperature/wind speed at location i Z j = the temperature/wind speed at sampled location j d ij = the distance from i to j k = the inverse distance weighting power (k=2) Meteorology stations collect temperature and wind speed values in every one hour period.However, meteorological data have critical changes that occur at 07:00 (morning), 14:00 (noon) and 21:00 (night) (Turkish State Meteorological Service, 2012).So that IDW method is applied to temperature and wind speed values of every day between 1 st and 31 st of August 2010 at morning, evening and night respectively (see Appendix 1).
Result of this method produces raster based map.Each pixel of raster map contains point data of robbery to assign values to each robbery events.
As a geographical variable of the study, point data of facilities are provided by Izmir Metropolitan Municipality.Data include gas stations, public transportation locations (bus stations, terminal, gang board, etc.), self-help centers, banking, police stations, military and recreation sites besides religious, educational, industrial, commercial, social, cultural, historical and tourist facilities and government agencies.These intense data consist of 4632 individual facility points.
Figure 2. Overlay of robbery events with trade (banking and commercial facility locations), education and public transportation stations We selected those facilities covering almost all crime locations by setting the buffer distance of 150 meters.This distance value is selected empirically by increasing the distance at 50 metres intervals.The facilities not fulfilling the above buffer criterion are excluded from the analysis.So we obtained that trade (commercial and banking together), education and transportation locations provide best fit for the coverage of the robbery events (see Figure 2).Thus, these facilities with police stations which is the security guard of the city are determined as geographical explanatory variables of the study.
Socioeconomic variables were obtained from census data like district level population, migration, education and age provided by Turkish Statistical Institute.Data extraction is performed by querying the database for primary school graduate of male and 18-35 age male populations.We adopted these queries because robbery events are encountered more frequent among this population group (Turkish Penal Institution, 2008).For grid scale analysis, population data should be calculated for small spatial units.For this purpose, population per building is extracted from vector based building footprints, heights and raster based land use maps of Izmir Metropolitan Municipality.
Land use maps are crucial to identify residential areas in order to calculate individual building populations.Vector maps containing building footprints and heights were overlaid onto land use maps and residential buildings are identified correspondingly.Other land use types were thus eliminated from the analysis (see Figure 3).Building footprint size is characteristic for extracting the number of apartments and determining the type of that building also crucial to extract information about the apartment number in a building and usage of a building.For instance, some parcels include residential utility buildings such as garages, sheds and barns (Ural et al., 2011), which are called as "premises".
Because available land use maps show only land use at parcel level, the buildings used as premises inside a parcel cannot be identified.Therefore, area thresholds are applied to classify residential building usages.The area threshold 60 m 2 is applied to identify premises.The threshold values for residential buildings are given in Table 1.(Ural et al., 2011).The average population density for districts is calculated as 0.26 people/100 m 2 and for buildings is 3.49.Mean value of these population densities, which is rounded to two, is taken as constant value of a household number.Population of a building is calculated by multiplying number of households, number of apartments and height of a building.By this methodology the population of the included districts is calculated as 3,364,476 which overlaps with the independently provided district population of 3,479,507 at 96.7 percent level.Therefore, building population is used to calculate population based data in each grid cell.

ANALYSIS
Unlike conventional regression, which produces single regression equation to summarize relationships among the explanatory and dependent variables, GWR generates spatial variation in the relationships among variables with producing multiple regression equation (Mennis, 2006).GWR method shows the existence of spatial relationships between different spatial units over space (Brunsdon et al., 1996).To explore scale effect, the spatial relationships between the rate of robbery and four socioeconomic, two spatiotemporal and five geographical variables are included in GWR model.Two different scales, grid and district level, are used for explaining differences and similarities between geographic locations of the robbery events.
Distances from banking, education, commerce, police station and number of public transportation stations are the geographical variables; temperature and wind speed are the spatiotemporal variables; population density, primary school graduate of male population, 18-35 age male population and migration are the socioeconomic variables of this research.For district scale analysis, descriptive statistics of the variables per district are presented in Table 2.The population is denser at locations where Izmir bay ends.Some variables such as population density, primary school graduate of male population and number of public transportation stations decrease away from the bay area towards inland.On the other hand the remaining variables increase.
Before constructing GWR model, one of the known linear regression which is Ordinary Least Square (OLS) method is applied to define the significance of explanatory variables in sequential manner.Then, GWR is carried out on robbery data from 21 district area of Izmir.The separate regression equation for each district is; where y i = i th observation value of dependent variable which is the robbery counts in each district β 0 = intercept β k = parameter to be estimated for variable k x ik = covariate value of the k th variable for i th observation (u i ,v i ) = coordinate location of i th observation ε i = error term Considering the GWR at district scale, distance from education and number of public transportation stations, wind speed and all the socio-economic variables are 95% significantly explain robbery in the study area.At the north and south parts of the study area even 99.5% significant relationship is calculated.At the northwest-southeast direction significance level is partially decreases.On the other hand, some variables like temperature, distances from bank, commerce and police station, has no explanatory significance on robbery at district scale.
We concentrated our grid scale analysis to the mostly populated area.Determining the explanatory variables of robbery for grid scale, grid cells are used.Grid cell size is defined as 400x400 m and this superimposed on the study area which is shown in Figure 5.This size is not small enough for predicting the future crime with exact location, but it is enough to give an opinion about the risky locations in the study area.
Grid cell midpoint coordinates are used to measure the distance away from the facilities such as DB, DC, DE and DS.NT is the count of the public transportation stations in a grid cell.POP is a total housing population in a grid cell which is computed from land use map per building.SG, AM and M are attained from obtained district level data according to household population in a grid cell.T and W are the averages of assigned robbery meteorology values, respectively.
Grid system is characterized for the grid area as; i = 1,2,…,n j index of grid cells C i = (c 1i ,c 2i ) the coordinates of midpoint of i th grid cell POP i = p 1 , p 2 ,…,p i population density in i th grid cell SG i = g 1 , g 2 ,…,g i Primary school graduate of male population in i th grid cell AM i = a 1 , a 2 ,…,a i 18-35 age male population in i th grid cell M i = m 1 , m 2 ,…,m i migration in i th grid cell T i = t 1 , t 2 ,…,t i average temperature ( o C) in i th grid cell W i = w 1 , w 2 ,…,w i average wind speed (km) in i th grid cell NT i = n 1 , n 2 ,…,n i number of public transportation stations in i th grid cell DB i = b 1 , b 2 ,…,b i nearest neighboring distance of banking to the centroid of i th grid cell DC i = dc 1 , dc 2 ,.,dc i nearest neighboring distance of commerce to the centroid of i th grid cell DE i = e 1 , e 2 ,…,e i nearest neighboring distance of education to the centroid of i th grid cell DS i = s 1 , s 2 ,…,s i nearest neighboring distance of police stations to the centroid of i th grid cell Population density, distances from bank, education, commerce, police station, number of public transportation stations and wind speed are defined as explanatory variables at grid scale (Appendix 3a).These variables explain %35 of the variance in robbery events.By applying spatial autocorrelation (Moran Index) to grid scale, it is obviously seen that robbery counts in grid cell are not distributed randomly over study area (MI=0.32 and Z-score=17.9)."What happens in the local areas surrounding them" are also crucial to identify the relationships existence (Brantingtam & Brantingham, 1997).Kernel density estimation revealed that some robbery events pile up at some locations, which are presented in black colour in Appendix 3b.These locations are the reason of decrease in significance of the explanatory variables.
As Brantingham & Brantingham (1997) mentioned that the occurrence or increase of some crime do not explained with population which is used as dominator.According to our research robbery events can be explained with dominated population and population related socioeconomic variables at district level.However, at grid scale level dominated geographical variables can explain robbery events.Hence, the research proves that explanatory variables for robbery change with changing scale of the study area.

CONCLUSION
An explanatory variable defined for dependent crime value in one of the studies, might be irrelevant for another one.This situation is explained in Fotheringham et al. (2002)'s words as "the measurement of a relationship depends in part on where the measurement is taken".Actually, obtained data for one set of areal units may not provide the relative significance not only for other study areas but also different scales of the same study area.This research examines if the explanatory variables are changed according to scale effect of the study area.The hypothesis of this research is the dominator of the explanatory variables may change with changing scale of the study area.
To explore the scale effect, district and grid scales of the same study area are used with the help of GWR method.Identifying the differences and similarities of the explanatory variables between two scales, organization of the wide range of data is essential.Most data are obtained in spreadsheet format, and it is crucial to make them spatial.One of the main advantages of an integrated GIS is that all data have one common denominator: the xy coordinates.Thus all data points are related to others via coordinates or the address as well as other characteristics such as the date and time.Then, some adopted methods are applied on data to store them in a spatial database and use for the analysis of this research.
The explanatory variables are defined from three kinds of independent variables which are socioeconomic, spatiotemporal and geographical are performed in GWR model.The results show that population based dominators explain robbery events at district scale.When focused on grid scale, it is seen that these variables are not sufficient to explain robbery events.However, when geographical based dominators are applied in the model, the grid scale results become significant.

Figure 1 .
Figure 1.Organization, storage and visualization of data with using adopted methods Data containing 1344 robbery events used in this research were obtained from Izmir Police Department in a spreadsheet format.The data consist of date, time, crime types and geographical locations of the events.The robbery events were already geocoded by the Police Department from address information of the events.

Figure 3 .
Figure 3. Part of the vector map showing the residential areas and building footprints

Figure 4 .
Figure 4. Map of GWR at district scale In Figure 4 deviations from the model in Equation2is plotted for each district.Considering the GWR at district scale, distance from education and number of public transportation stations, wind speed and all the socio-economic variables are 95% significantly explain robbery in the study area.At the north and south parts of the study area even 99.5% significant relationship is calculated.At the northwest-southeast direction significance level is partially decreases.On the other hand, some variables like temperature, distances from bank, commerce and police station, has no explanatory significance on robbery at district scale.

Figure 5 .
Figure 5. 400x400m cells for the grid scale area For grid scale analysis, descriptive statistics of the variables per grid cells are presented in Appendix 2.Compared to the explanatory variables for district scale, none of the socio-economic variables except population density explains robbery occurrences at grid scale.Contrary to district scale, distances from all facilities are correlated with robbery.