DEVELOPMENT OF TIME-SERIES HUMAN SETTLEMENT MAPPING SYSTEM USING HISTORICAL LANDSAT ARCHIVE

Methodology of automated human settlement mapping is highly needed for utilization of historical satellite data archives for urgent issues of urban growth in global scale, such as disaster risk management, public health, food security, and urban management. As development of global data with spatial resolution of 10-100 m was achieved by some initiatives using ASTER, Landsat, and TerraSAR-X, next goal has targeted to development of time-series data which can contribute to studies urban development with background context of socioeconomy, disaster risk management, public health, transport and other development issues. We developed an automated algorithm to detect human settlement by classification of built-up and non-built-up in time-series Landsat images. A machine learning algorithm, Local and Global Consistency (LLGC), was applied with improvements for remote sensing data. The algorithm enables to use MCD12Q1, a MODIS-based global land cover map with 500-m resolution, as training data so that any manual process is not required for preparation of training data. In addition, we designed the method to composite multiple results of LLGC into a single output to reduce uncertainty. The LLGC results has a confidence value ranging 0.0 to 1.0 representing probability of builtup and non-built-up. The median value of the confidence for a certain period around a target time was expected to be a robust output of confidence to identify built-up or non-built-up areas against uncertainties in satellite data quality, such as cloud and haze contamination. Four scenes of Landsat data for each target years, 1990, 2000, 2005, and 2010, were chosen among the Landsat archive data with cloud contamination less than 20%. We developed a system with the algorithms on the Data Integration and Analysis System (DIAS) in the University of Tokyo and processed 5200 scenes of Landsat data for cities with more than one million people worldwide.


INTRODUCTION
Urban expansion is one of the most important issues in development problems (Angel et al., 2005;Foley et al., 2005).Monitoring urban formation is needed for urban management which is connected to other issues, such as disaster risk management (Doocy et al., 2007;Dasgupta et al., 2009), public health (Brooker et al., 2006;Omumbo et al., 2005), transportation networks (Schneider et al., 2003), and food security (Balk et al., 2005).Because geographic data of urban development is rarely affordable to less and least income countries, innovative methods to develop such data are urgently needed.
Satellite-based human settlement mapping has contributed to better urban monitoring and planning (Schneider et al., 2003;Schneider and Woodcock, 2008).Automation of the mapping is important to achieve finer human settlement maps, which are especially needed in developing countries.As development of global data with spatial resolution of 10-100 m was achieved by some initiatives using ASTER (Miyazaki et al., 2014), Landsat (European Commision Joint Research Centre, 2014), and TerraSAR-X (Esch et al., 2013), development of time-series data would be suggested to be the next goal because such data can contribute to studies on urban growth processes (Taubenböck et al., 2014) which could be closely connected with socioeconomy, disaster risk management, public health, transport and other development issues.In this paper, we present development of a system for human settlement mapping using Landsat archive with * Corresponding author an automated algorithm.The preliminary results on the developed system is also presented.

METHODOLOGY
Development of human settlement maps was initiated by use of coarse-resolution satellite data, such as DMSP-OLS and MODIS (Center for International Earth Science Information Network et al., 2004;Schneider et al., 2010).The developed map data contributed to socio-economic analysis of urban development with the consistency of the data worldwide (Montgomery, 2008).In addition, needs for more spatially detail data were emerged for applying the data to analysis on urban forming, such as sprawl and compactness of cities (Angel et al., 2005;Schneider and Woodcock, 2008).For such analysis, Landsat and ASTER are good data resources to represent forms and networks of human settlement including buildings, paved areas, and other man-made structures (Small, 2005;Esch et al., 2014).To use such higherresolution satellite data for human settlement mapping in traditional methods, development of training data, the most labour-intensive process in remote sensing projects, is required for accurate classification; however, such process is not feasible to extend the data development in global scale.Some research initiatives developed automated algorithms using machine learning algorithm to use existing coarse-resolution human settlement maps as training data (Miyazaki et al., 2014;Duan et al., 2015;European Commision Joint Research Centre, 2014).
We applied this approach which enabled image classifications of large amount of satellite data without any human labour resource.
While classification algorithms were successfully automated, there have been still problems in accuracy caused by uncertainties of satellite data quality, such as cloud contaminations, which are considerable constraints for global applications of Landsat data (Ju and Roy, 2008).For development of time-series dataset, requirements of observation date are additional constraints in availability of good quality data.To reduce impact of the uncertainties in data quality, we proposed to combine datasets of multiple dates into a single target date.For example, to develop a data for 2005, supplement data for August 2004 and September 2006 were used in addition to the data for 2005.
In the following, we describe details of the method using coarseresolution land cover maps as training dataset and combining supplement data into a single data.

Learning with Local and Global Consistency
For the classification of built-up and non-built-up area pixels, we applied a machine learning algorithm known as Learning with Local and Global Consistency (Zhou et al., 2003), which demonstrated an application to human settlement mapping using ASTER satellite data with 15-m resolution (Miyazaki et al., 2013).The LLGC enabled to use existing coarse-resolution human settlement maps as training data for classification of pixels in ASTER data to built-up or non-built-up areas through iterative graph-based clustering.
Another advantage of the LLGC is computation efficiency.Although the algorithm is iterative in concept, its calculation is solved by a few matrix operations (Zhou et al., 2003).This advantage is important to develop global human settlement maps using large amount of satellite data.It should be noted that LLGC yields not only classifications results (built-up or non-built-up), also confidence of the classifications ranging 0.0 to 1.0, which can be interpreted as probability of existence of built-up area in the pixel.
Because the confidence values were likely to depend on proportions of initial classifications (built-up and non-built-up for this case), we masked extent of data processing by buffer areas of initial extents so that equal number of pixels within builtup and non-built-up areas were processed for LLGC.

Composition of the LLGC results
To reduce impact of the uncertainty, the confidence values were composed by taking median values by pixel among several LLGC results of Landsat data for a certain range of observation dates into a single output.For example, the results for August 2004, July 2005, October 2005, and September 2006 were used for the human settlement mapping for 2005.
The Landsat scenes were automatically assigned by a score that calculated from percent of cloud cover (less cloud cover is preferable) and the length of dates between the target date and observation date (observation date closer to the target is preferable).

IMPLEMENTATION OF THE SYSTEM
We implemented the algorithms on a high-performance computing system in the University of Tokyo, called Data Integration and Analysis System (DIAS).The DIAS provided parallel computing with a few tens of processors.For better flexibility and configurability, we implemented the algorithms and constructed the programs using open-source software only.As the DIAS was Linux-based system, which is originated by open-source development, construction of the system was opensource-friendly so that the system could be portable to other Linux-based computing systems.

EXPERIMENT RESULTS AND DISCUSSIONS
We conducted an experiment of the proposed method to Landsat scenes covering cities with population more than one million for 1990, 2000, 2005, and 2010.The Landsat data were retrieved from the Landsat data archive of the US Geological Survey (U.S. Geological Survey, 2015).We used the built-up layer extracted from MCD12Q1 (Land Processes Distributed Active Archive While such overestimations were observed for some cities, results for some other cities had much underestimations.This was due to underestimations of built-up areas in MCD12Q1 for such cities because the algorithm depended on the initial extent of human settlement (the built-up areas in MCD12Q1 for this case) by masking the data to adjust pixels for built-up and non-built-up areas in equal numbers.
We added supplemental data of initial built-up areas, which were prepared by visual interpretation of Landsat false colour composite for respective target years (Figure 1).The result was improved by addition of the supplemental data regardless that the visual interpretation was conducted very roughly in a coarse scale for Landsat's spatial resolution.The results indicated the system should have a function of collecting visually interpreted data which supplements omitted built-up areas in the MCD12Q1.
For further improvements of the method, we will conduct accuracy assessment in statistical manners, such as use of error matrix (Foody, 2002).Also, we will publish the result data for end user's applications once the accuracy reaches a product quality.Figure 1.Improved result for Maputo with supplemental data of initial built-up areas.
Center, 2014), a MODIS-based global land cover map, for training data of built-up areas.The algorithm was applied to 5200 scenes ofLandsat data for 1990Landsat data for  , 2000Landsat data for  , 2005Landsat data for  , and 2010.Some examples of the results are presented in Figure2.The results well represented urban expansion of the cities in the last decades.However, some results for 1990 looked occupying major areas of human settlement extent although the urban development in the cities rapidly progressed later than 2000.It might be due to overestimation of built-up areas for 1990 where the results for 2000, 2005, and 2010 classified as non-built-up.This sort of inconsistency could be corrected by comparison of the results among the target years, such as majority decision or Bayesian inference.For example, if a pixel classified as a built-up area for 1990 and as a non-built-up area for 2000, 2005, and 2010, the pixel shall be classified as a non-built-up area also for 1990.

Table 1 .
Table 1 lists the open-source software used for the system.List of software used for the human settlement mapping system