SPARSE UNMIXING VIA VARIABLE SPLITTING AND AUGMENTED LAGRANGIAN FOR VEGETATION AND URBAN AREA CLASSIFICATION USING LANDSAT DATA

In this paper, we explore the possibility of sparse regression, a new direction in unmixing, for vegetation and urban area classification. SUnSAL (Sparse unmixing via variable splitting and augmented Lagrangian) in both unconstrained and constrained forms (with the abundance non-negativity and abundance sum-to-one constraints) were used with a set of global endmembers (substrate, vegetation and dark objects) to unmix a set of computer simulated noise-free and noisy data (with Gaussian noise of different signal-to-noise ratio) in order to judge the robustness of the algorithm. The error in the fractional estimate was examined for varying noise power (variance): 2, 4, 8, 16, 32, 64, 128 and 256. In the second set of experiments, a spectrally diverse collection of 11 scenes of Level 1 terrain corrected, cloud free Landsat-5 TM data representing an agricultural setup in Fresno, California, USA were used. The corresponding ground data for validation were collected on the same days of satellite overpass. Finally in the third set of experiments, a clear sky Landsat-5 TM data for an area near the Golden Gate Bridge, San Francisco (an urbanized landscape), California, USA were used to assess the algorithm. The fractional estimates of the 30 m Landsat-5 TM data were compared with the fractional estimates of a high-resolution World View-2 data (2 m spatial resolution) obtained using a fully constrained least squares algorithm. The results were evaluated using descriptive statistics, correlation coefficient, RMSE, probability of success and bivariate distribution function, which showed that constrained model was better than unconstrained form.


INTRODUCTION
The conversion of vegetation land cover (LC) such as forest and farmland to impervious surfaces are sites of significant natural resource transformation.These impervious surfaces are categorised as settlements or urban areas and are currently among the most rapidly changing LC types and the loci of human population activities (Lambin et al., 2001).This unprecedented LC change is leading to environmental degradation and ultimately influencing weather and climate from local to global levels.Therefore, mapping and assessing the status of LC at various scales and time frames is required for mitigation, planning, and decision making.The LC patterns can be captured through multi-resolution space-borne remote sensing (RS) data that facilitate observations across larger extent of the Earth's surface.
Usually, vegetation and urban areas have more heterogeneity with contrasting features compared to other LC classes.Since, these LC features occur at finer spatial scales than the resolution of the primary satellites, observed data are mixture of two or more LC classes, representing a mixed pixel.Therefore, these two environments are particularly difficult to model because of considerable spatial and spectral variation and deriving accurate, quantitative measures over such regions remains a challenge (Forester, 1985;Lu and Weng, 2004;Xian and Crane, 2005).The solution to mixed pixel problem typically centers on spectral unmixing techniques (Ceamanos et al., 2011) such as linear mixture model (LMM), which estimates the abundance or proportion of each class within individual pixels assuming that the reflectance spectrum of a mixture is systematic combination of the component reflectance spectra in the mixture (called endmembers).The optimal solution of the mixture models can be unconstrained or constrained (when the abundance nonnegativity constraint (ANC) and abundance sum-to-one constraint (ASC) are imposed).ANC restricts the abundance values from being negative and ASC confines the sum of the abundance values of all the classes to one.The abundance maps render more accurate fractional estimates of each class compared to assignment of a single class per pixel as in the traditional hard classification scheme.
In this paper, we investigate the possibility of using sparse regression (Bioucas-Dias and Figueiredo, 2010) for unmixing of mixed pixels.We use SUnSAL (Sparse unmixing via variable splitting and augmented Lagrangian) (Iordache et al., 2011) in an unconstraint form, and in the presence of both ANC and ASC imposed on the solution (i.e.SUnSAL in the constrained form -Constrained SUnSAL or CSUnSAL, and the problem is referred to as constrained sparse regression) to obtain class information at subpixel level with computer simulated data, and Landsat data of an agricultural landscape and an urban scenario.In the first set of experiments, computer simulated noise-free data and noisy data of different signal to noise ratio (SNR) are used to estimate the fraction of different endmembers.In separate tests, Gaussian noise (a random variable with 0 mean and fixed variance) was added to the data.Secondly, a spectrally diverse collection of 11 cloud free scenes of Landsat-5 TM data representing an agricultural landscape in Fresno, California, USA were unmixed with three global endmember (substrate, vegetation and dark objects) and validated using ground vegetation proportion.In the third set of experiments to study the urban environment, Landsat-5 TM data (at 30 m spatial resolution) for an area of San Francisco, California, USA was unmixed and the abundance maps were compared with the fractional estimates of World View 2 (WV-2) data (2 m spatial resolution) for validation.The results were evaluated using descriptive statistics, Pearson product-moment correlation coefficient (cc), root mean square error (RMSE), probability of success and bivariate distribution function (BDF).
The paper is organized as follows: section 2 discusses linear mixture model, section 3 discusses sparse regression and the SUnSAL algorithm and section 4 details the data used in this analysis.Results and discussion are presented in section 5 with concluding remarks in section 6.

LINEAR MIXTURE MODEL (LMM)
If there are M spectral bands and N classes, then associated with each pixel is a M-dimensional vector y whose components are the gray values corresponding to the M bands.Let E = [e 1 , …e n- 1 , e n , e n+1 ..., e N ] be a M × N matrix, where {e n } is a column vector representing the spectral signature (endmember) of the nth target material.For a given pixel, the abundance or fraction of the nth target material present in the pixel is denoted by α n , and these values are the components of the N-dimensional abundance vector α.Assuming LMM (Shimabukuro and Smith, 1991), the spectral response of a pixel in any given spectral band is a linear combination of all the endmembers present in the pixel at the respective spectral band.For each pixel, the observation vector y is related to E by a linear model written as where  accounts for the measurement noise.We further assume that the components of the noise vector  are zero-mean random variables that are i.i.d.(independent and identically distributed).Therefore, covariance matrix of the noise vector is σ 2 I, where σ 2 is the variance, and I is M × M identity matrix.

SPARSE REGRESSION
Sparse regression (Iordache et al., 2011) is a new direction in unmixing which is related to both statistical and geometrical frameworks.A mixed pixel with sparse linear mixtures of spectral signatures from a library (available a priori) is fitted.
Estimating or generating the endmembers from the data is not required here.The method depends on searching an optimal subset of signatures that can model each mixed pixel in the data.The degree of coherence (isometry) between the data matrix and sparseness of the endmember signals decides the ability to obtain sparse solutions for an underdetermined system.When the data matrix has low coherence and the signals are sparse, it is an ideal situation for sparse unmixing.
Endmember search is conducted in a large library, say  ∈ ℝ !!!, where M < N and  ∈ ℝ ! .It is possible that only a few signatures contained in E would involve in the mixed pixel spectrum.Therefore,  will contain many values of zero and is a sparse vector.The sparse regression problem is expressed as where  !denotes the number of nonzero components of , and δ ≥ 0 is the noise and modeling error tolerance.Note that δ in (2) is the error tolerance and  in (1) is a M x 1 vector collecting errors affecting the measurements at each spectral band. ≥ 0 and  ! = 1 refers to ANC and ASC, respectively.A set of sparsest signals belonging to the (N-1) probability simplex satisfying error tolerance inequality defines the solution of (2).

SUnSAL (Sparse unmixing via variable splitting and augmented Lagrangian)
When the fractional abundances from sparse regression follow ANC and ASC, the problem is referred to as constrained sparse regression (CSR).The general CSR problem is defined as where  ! and  ! are the l 2 and l 1 norms and λ ≥ 0 is a weighing factor between the l 2 and l 1 terms.
SUnSAL is based on the alternating direction method of multipliers (ADMM) (Eckstein and Bertsekas, 1992;Gabay andMercier, 1976 in Bioucas-Dias andFigueiredo, 2010).ADMM can be derived as a variable splitting procedure followed by the adoption of an augmented Lagrangian method to solve the constrained problem.The algorithm is briefly stated here.For more detailed derivation, the readers are encouraged to refer (Iordache et al., 2011).Assume that matrix E is known and corresponds to underdetermined systems (N > M) rather than obtained from an endmember extraction algorithm (where N < M).Consider arbitrary  > 0,  !,  !∈ ℝ !""_!"# (where aff_dim is an affine dimension), and , where i = 0, 1, …}.
Step 3: Step 5: Step 8: Increment i by 1 Step 9: Exit where and λ is a parameter controlling the relative weight.

Computer Simulations
Simulation of imagery was carried with a set of global spectra of the three endmember libraries (available at: http://www.ldeo.columbia.edu/~small/GlobalLandsat/styled-3/index.html) to generate three abundance maps.Then the linear mixture model in (1) was inverted (with ANC and ASC imposed) to generate computer simulated synthetic noise free data of 6 bands of size 512 x 512.In a separate set of experiments, error in the estimate was examined as the noise power (variance) was set to 2, 4, 8, 16, 32, 64, 128 and 256.This noise is a random number drawn from a Gaussian distribution where the mean of each endmember is set to 0 and the variability is controlled i.e., Gaussian noise = mean + random perturbation; random perturbation is a Gaussian random variable of specific variance.

Landsat data
For the agricultural landscape, a spectrally diverse collection of 11 scenes of Level 1 terrain corrected, cloud free Landsat-5 TM 16 bit (path 43, row 35) of Fresno, California (figure 1) were used.These data were captured on April 4 and 20, May 22, June 7 and 23, July 9 and 25, August 26, September 11 and 27 and October 13 for the year 2008 and were calibrated to exoatmospheric reflectance using the calibration approach and coefficients given in Chander et al., (2009).These scenes were selected because a coincidental set of ground canopy cover were available for a number of surveyed field within the footprint of Landsat WRS path 43, row 35.The atmospheric reflectance were converted to surface reflectance correcting for atmospheric effects by means of the 6S code implementation in the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) atmospheric correction method (Masek et al., 2006).74 polygons of the fractional vegetation cover were generated from digital photographs taken with a multispectral camera mounted on a frame at nadir view pointed 2.3 m above the ground at the commercial Joaquin Valley (in central California) on 11 dates mentioned above, except for one date when the Landsat acquisition preceded the ground observation by one day.For each date, 2-4 evenly spaced pictures were taken for an area of 100 m x 100 m with center location marked by a GPS (Johnson and Trout, 2012).These fractional measurements belonged to a diverse set of seasonal and perennial crops in various developmental stages, from emergence to full canopy that represented an agricultural scenario/environment in the RS data.
To study an urban scenario, a set of coincident clear sky Landsat-5 TM data and WV-2 data for an area of San Francisco (SF) were used to assess the algorithm.SF is chosen for the test site because of its urbanized landscape as shown in figure 2. WV-2 data were acquired a few minutes after the Landsat TM data acquisition on May 1, 2010 for an area near the Golden Gate Bridge.The spectral range of first four bands (Blue, Green, Red and NIR) of Landsat data have a good correspondence with the WV-2 bands 2, 3, 5 and 7 in terms of wavelength range in the Electromagnetic spectrum and therefore they have a similar mixing space.WV-2 data were converted to Top of Atmosphere Reflectance values using the python program (available at https://github.com/egoddard/i.wv2.toar) in GRASS GIS 7.1.The Landsat unmixed images were compared with the corresponding WV-2 fraction images for accuracy assessment.

Endmember generation
Global mixing spaces were sampled by using a spectrally diverse LC and diversity of biomes with 100 Landsat ETM+ scenes (Small and Milesi, 2013).This defined a standardized set of spectral endmembers of substrate (S), vegetation (V), and dark objects (D).Vegetation refers to green photosynthetic plants, dark objects encompass absorptive substrate materials, clear water, deep shadows, etc., and substrate includes soils, sediments, rocks, and non-photosynthetic vegetation.For simplicity, we refer substrate, vegetation and dark objects as "S", "V", and "D" (SVD) in the rest of this paper.The SVD endmember coefficient, in addition to dates and locations of each subscene are available at   (Small, 2004).
For validation of computer simulated data, the estimated class proportion maps were compared with the synthetic true abundance maps using visual investigation and various other measures such as descriptive statistics (minimum and maximum fractional estimates), cc, RMSE, probability of success ( ! ) and BDF.Considering P = number of observed M-dimensional pixel vector, the true abundance r ranges from −1 to 1. 1 implies that a linear equation describes the relationship between  and  perfectly with all the data points lying on a straight line for which  increases as  increases.r = −1 infers that all data points lie on a line for which  decreases as  increases and r = 0 means there is no linear correlation between the true and estimated abundances.RMSE (Nascimento and Dias, 2005) is defined as: Smaller the RMSE, better the unmixing result and higher is the accuracy. ! is an estimate of the probability that the relative error power is smaller than a certain threshold (Iordache et al., 2011) i.e.  !≡ ( If threshold is 10, and  != 1, it suggests that the total relative error power of proportional abundances is less than 1/10 with a probability of 1. Estimation result is accepted when ! !≤ 0.95 (5.22 dB).0.95 is the average of the 99 th percentile of all the abundances of the three endmembers for noise variance 8.At this noise variance, the SNR which is the logarithm to the base 10 of the ratio of sum of the square of the true abundances to the sum of the square of the difference between the estimated and the true abundances turns out to be 5.22 dB.Empirically, we found that when  !=1, then 1 dB ≤ the SNR for the entire pixels in the abundance ≤ 8 dB for our data set.BDF was used to visualize the accuracy of prediction by mixture models.Points along a 1:1 line on the BDF graph indicate predictions that match completely with the real/actual/reference proportions.The smaller the difference between reference and estimated proportions, the closer the points will lie to the diagonal 1:1 line.
The SUnSAL programs in Matlab were obtained from the authors which are also available online for download (http://www.lx.it.pt/~bioucas/code/sunsal_demo.zip).GRASS (Geographic Resources Analysis Support System) -a free and open source package was used for visualization of results and statistical analysis was carried in R statistical package in a Linux system environment on the NASA Earth Exchange at the NASA Advanced Supercomputing Facility.The parameter λ in SUnSAL was set to 0.001, maximum number of iterations was set to 100 and all other parameters were set to default.

Computer simulations
Figure 3 (a-c) shows noise free synthetic abundance maps for endmember 1, 2 and 3. Figure 3 (d-f) shows estimated abundance maps obtained for each signature class (for the three endmembers) corresponding directly to the gray scale values for each image from SUnSAL, (g-i) from CSUnSAL with the range of abundance fraction values specified in square bracket [minimum abundance value -maximum abundance value] underneath each figure.Visual examination of the abundance maps of each category revealed that they were similar in terms of the relative fractions.Table 1 reveals that for the noise free data, both models have high cc, low RMSE and  != 1.At noise variance 256, CSUnSAL showed higher cc and lower RMSE.At 256 noise level, SUnSAL has worst performance for all the endmembers.The details of other noise variances are omitted due to space constraints.For each endmember, both SUnSAL and CSUnSAL showed high cc (close to 1) when variance in the noise was increased till 16, beyond which cc gradually decreased and reached a minimum of 0.12 for endmember 3 for SUnSAL.To a certain noise level (noise variance 32), both the models were robust, however as noise increased in the data, they tend to produce higher RMSE following a hyperbolic curve.The analysis revealed that the unconstrained model performed well in identification and discrimination of the classes, however it did not provide accurate abundance estimation.

An agricultural landscape
Each of the 11 atmospherically corrected Landsat scenes was unmixed to obtain the abundance estimates within each pixel.Figure 4 shows sample estimated abundance maps from one of the Landsat scenes for S, V and D classes obtained from CSUnSAL.For each scene, the proportions of vegetation fraction in the image were compared with the ground observations.Table 1.Cc, RMSE and ! for endmember 1, 2, and 3 (E1, E2, E3) for noise variance 0 and 256 Figure 5 shows BDF plots of the real/true vegetation fractions against estimated abundances along with the regression line and R 2 values.MAE of vegetation fraction for SUnSAL and CSUnSAL was 0.08 and cc between fractions of vegetation ground polygons to the fraction abundance estimates was 0.98.
A comparison of the satellite derived vegetation fraction with ground measurements showed that both models provided a reasonably accurate direct estimation of fractional cover from Landsat data in the context of this study, accurately reproducing the proportions of SVD endmembers in the 11 scenes under investigation.There could be effect of a few factors contributing to the errors in abundance estimates from the images.For the Fresno area, the acquisitions were taken during clear sky conditions (except for 3 days) and also coincided with the approximate time of the satellite overpass that took into account the illumination effect.The images were corrected to surface reflectance to curtail the effect of atmospheric noise.To avoid the geolocation errors caused due to misregistration, atmospheric effects, presence of background mixed with substrate, etc. a matrix of 3 x 3 pixels centered over the GPS location was used.Thus estimates of ground fractional cover from the 2-4 digital photographs well represented the field conditions within the Landsat IFOV of the 3 x 3 pixels window to which they were compared.Nevertheless, since fraction estimates from digital images were based on image segmentation and thresholding identified from distribution of camera pixels, there could be issues while resolving the ground cover estimates with the mixture model outputs.Field data were gathered in the absence of topography, so soils from two different field conditions may differ, causing minor errors in abundance estimates of substrate and dark objects.This difference is anticipated to be more with lower vegetation fraction cover than at dense vegetation sites, but the image derived fraction estimates closely matched the ground observations on sparse vegetation conditions, appreciating the fact that vegetation fraction from the image is modelled only for the portion that is illuminated by sunlight and the shaded portions of the canopy are likely to be assigned to the dark fractions.The algorithm with the available global endmembers accounted for the variance in the soil by substrate and dark object fractions, given the fact that crop conditions were overall very uniform.

An urban scenario
Landsat and WV-2 data of SF (figure 6) were unmixed using the global endmembers (SVD) to study the performance of the algorithm.WV-2 data were unmixed using fully constrained least squares (FCLS) method (Chang, 2003) since it is robust and is often used as a benchmark for comparing the performance of other unmixing techniques.Figure 7 shows the estimated abundance fraction corresponding directly to the gray scale values for each abundance map from Landsat and WV-2 data for SVD classes. 2 m resolution is adequate to capture small to medium sized buildings, sidewalks, streets and trees as evident from figure 8.At 30 m resolution, each 2 m WV-2 pixel is even less than 0.5% of the area within the 30 m full width half maximum of Landsat point spread function.
The 2 m WV-2 fractions were convolved with a Gaussian low pass filter having 30 m full width half maximum, with the point spread function of the Landsat sensor and resampled to 30 m. Convolution followed by resampling of the high resolution data to 30 m resolution allowed assessing the geo-correction and comparison of the two data sets.Coordinate comparison of the high low-resolution data sets at many random pixels corresponding to same spatial locations did not reveal any systematic image registration error.The WV-2 SVD fractions were then compared to Landsat SVD fractions through MAE and cc for each abundance map.MAE of S, V, and D fractions for SUnSAL was 0.11, 0.07 and 1.99.SUnSAL had cc (statistically significant at 0.99 confidence level, p-value < 2.2e - 16 ) 0.86 for S, 0.88 for V and -0.03 for D classes.Note that the third endmember (D -Dark objects) has not been classified properly with the unconstrained form of the algorithm.The minimum and maximum abundance values for the third endmember (D) obtained from SUnSAL were (-20, 57).For the constrained form, minimum and maximum  For the SF area, most of the urban pixels were either mixed with vegetation (urban forest), roads, shadows or appear like dark objects because of the different materials used in the construction of the terrace.Nevertheless, this study shows that urban reflectance can be accurately modelled with a three endmember mixture model using Landsat and WV-2 data.Error in SF Landsat data fraction estimation could be either due to error in estimates from the algorithm or geo-registration or both.SVD model represented the land surface as independent constituents with different landscape properties such as vegetation and urban.It characterized the fraction of illuminated vegetation, substrate or impervious materials and the shadowed or nonreflective surfaces such as water, roofing tar, etc. High substrate fractions are rational estimates of the impervious surface in developed land in temperate and tropical regions as pervious surfaces are mostly covered by some kind of vegetation and exposed substrate are most likely impervious.Sometimes, if both pervious and impervious surfaces are exposed and illuminated, then this leads to ambiguity, a tricky classification in arid and semi arid regions.Small and Lu (2006) suggests to use vegetation fraction as a proxy for fractional pervious surface because vegetation cannot thrive on impervious surface, so presence of vegetation implies presence of some amount of pervious Therefore, using detectable vegetation as an indicator of permeable surface can account for the range of natural and built surfaces.
Sparse techniques are meant to fundamentally look for the endmembers in a spectral library containing spectra of many materials, with only a few of them present in a pixel, i.e., the vector of fractional abundances is sparse and enforce the sparsity of the solution explicitly.However, sparse unmixing demonstrated good performance in this study even though the number of endmembers in the available library were only three.
In future studies, the algorithm would be implemented on hyperspectral data with a large library of spectral endmembers to assess its robustness.In the absence of spectral library, the technique can also be used with image derived endmembers.

CONCLUSION
In this paper, sparse regression was explored to unmix Landsat data of an agricultural and urban scenario.It was observed that the areas with high fractional abundance of the considered endmember were well delineated while mixed regions with low abundance were more homogeneous.The abundance maps obtained from both unconstrained and constrained form of the algorithm have spatial consistency with good accuracy in the spatial distribution of the classes.The outcome of this study revealed that the constrained form of the algorithm could adequately model the data for unmixing better than the unconstrained form.

Figure 1 .
Figure 1.Study area: San Joaquin Valley in central California.Field data collection site in San Joaquin Valley with surveyed boundaries (marked in black color polygons) overlaid on a false color composite (FCC) of Landsat data from which ground fractional cover were derived for validation.

Figure 2 .
Figure 2. Study area: FCC of a part of San Francisco city.Zoomed image of the urban area (marked with rectangle in inset) shows mixing of substrate with vegetation, roads, shadows and dark objects.

Figure 3 .
Figure 3. Abundance maps: (a -c) synthetic abundance for endmember 1-2-3; (d -f) abundance maps from SUnSAL, and (g -i) abundance maps from CSUnSAL from noise free data.In the figure, black indicates absence of a particular class (the minimum abundance value) and white indicates full presence of that class in a pixel (the maximum abundance value).Intermediate values of the shades of gray represent mixture of more than one class in a pixel.

Figure 4 .
Figure 4. Abundance maps obtained from CSUnSAL for SVD classes.In the legend, 0 (black) indicates absence of a particular class and 1 (white) indicates full presence of that class in a pixel.Intermediate values of the shades of gray represent mixture of more than one class in a pixel.

Figure 5 .
Figure 5. BDF plots of vegetation fraction at each of the sampled locations and regression lines of ground cover proportions (real abundances) versus estimated abundances from SUnSAL and CSUnSAL.

Figure 6 .
Figure 6.FCC of San Francisco area in Landsat and WV-2 resolution.

Figure 7 .
Figure 7. Endmember fractions of SVD from Landsat data using SUnSAL (first row), CSUnSAL (second row) and from WV-2 data (third row).For SUnSAL, the range of abundance fraction values for S, V and D are specified in square brackets underneath each figure [minimum abundance value -maximum abundance value].For other abundance maps, 0 (black) indicates absence of a particular class and 1 (white) indicates full presence of that class in a pixel as shown in the legend.Intermediate values of the shades of gray represent mixture of more than one class in a pixel.

Figure 8 .
Figure 8. FCC of validation site near Golden Gate Bridge, SF at Landsat and WV-2 spatial resolution.

Figure 9 .
Figure 9. BDF plots of SVD fractions from WV-2 and Landsat and regression lines of WV-2 estimates (real abundance) versus estimated abundance from SUnSAL and CSUnSAL.