MODELING SPATIAL DISTRIBUTION OF A RARE AND ENDANGERED PLANT SPECIES ( Brainea insignis ) IN CENTRAL TAIWAN

With an increase in the rate of species extinction, we should choose right methods that are sustainable on the basis of appropriate science and human needs to conserve ecosystems and rare species. Species distribution modeling (SDM) uses 3S technology and statistics and becomes increasingly important in ecology. Brainea insignis (cycad-fern, CF) has been categorized a rare, endangered plant species, and thus was chosen as a target for the study. Five sampling schemes were created with different combinations of CF samples collected from three sites in Huisun forest station and one site, 10 km farther north from Huisun. Four models, MAXENT, GARP, generalized linear models (GLM), and discriminant analysis (DA), were developed based on topographic variables, and were evaluated by five sampling schemes. The accuracy of MAXENT was the highest, followed by GLM and GARP, and DA was the lowest. More importantly, they can identify the potential habitat less than 10% of the study area in the first round of SDM, thereby prioritizing either the field-survey area where microclimatic, edaphic or biotic data can be collected for refining predictions of potential habitat in the later rounds of SDM or search areas for new population discovery. However, it was shown unlikely to extend spatial patterns of CFs from one area to another with a big separation or to a larger area by predictive models merely based on topographic variables. Follow-up studies will attempt to incorporate proxy indicators that can be extracted from hyperspectral images or LIDAR DEM and substitute for direct parameters to make predictive models applicable on a broader scale.


INTRODUCTION
Biodiversity is very important for humans and all other species on the Earth.Without the diversity of species, ecosystems are more fragile to natural disasters and climatic change.With an increase in the rate of species extinction, we must conserve ecosystems and rare species by choosing right methods that are sustainable on the basis of appropriate science and human needs.Forest resources in Taiwan are very abundant, but environmental carrying capacity of the island is vulnerable, thus when using them we must think of conservation at the same time.
SDM is based on the environmental conditions of known sites to predict unknown area, and also to identify the relationship between the species and environment.The distribution pattern of natural vegetation is associated with four types of environmental factors, including climatic, physiographic, edaphic, and biotic factors (Su, 1987).For SDM, it is desirable to predict a species distribution on the basis of ecological (direct) parameters (i.e.climate, soil, and biotic factor) that are to be the causal, driving forces for its distribution.Data for such direct parameters, however, are generally difficult or expensive to measure, soil data are even more difficult to derive, and they tend to be less accurate than pure topographic characteristics (Guisan and Zimmermann, 2000).Moreover, biotic factor is extremely difficult to estimate due to the fine spatiotemporal resolution and potentially complex nature of biotic dimensions (Barve et. al., 2011).
On the other hand, indirect parameters (e.g.topographic variables: elevation, slope, aspect) are most easily measured by remote sensing and are often used because of their good correlation with observed species patterns (Guisan and Zimmermann, 2000).Hence, SDM should be run on an iterative basis with topographic data in initial rounds and climatic data, soil data, or biotic data, when available, in later rounds since not all the data needed by SDM for the four types of factors aforementioned are readily available at one time.
In this study, we used four methods: MAXENT, GARP, GLM, and DA to build models and to predict the potential habitat of a rare plant together with five different sampling schemes.Our study area falls within a homogeneous climatic zone with one degree of latitude; therefore, we took account of the area's microclimate, which in turn affects species' distribution.Indeed, the topography of an area influneces the microclimate of that area (Su, 1987).Furthermore, fine spatial-resolution soil data and biotic data were not available up to the present.Hence, we did run the four aforementioned SDM models on an iterative basis by incorporating elevation, slope, aspect, terrain position, and vegetation index derived from SPOT images in the first round.We designed five sampling schemes from two areas: 1) a small range with the distance of 0.7 km between sampling sites and 2) a large range with the distance of about 10 km between sampling sites.We evaluated these models in terms of accuracy and implementation efficiency and determined the optimum for predicting the habitat of a rare plant.The predictive outcome from SDM would be used to prioritize field-survey areas for collecting fine resolution microclimatic, edaphic or biotic data for refining predictions of potential habitat in the later rounds of SDM or search areas for new population discovery.

STUDY AREA
The study area consists of two parts, one part is Huisun Experimental Forest Station (HEFS), and the other is Tong-Mao Mountain, as shown in Figure 1.HEFS is in central Taiwan, and situated within 24 • 2´-24 • 5´ N latitude and 121 • 3´-121 • 7´ E longitude.This station is the property of Chung-Hsing University, and has a total area of 7, 477 ha.This station ranges in elevation from 454 m to 3, 419 m, and its climate is temperate and humid.Hence, this area has nourished about 1,100 plant species and is a representative forest in central Taiwan.This study took the samples from Sihwufongshan (Pine-breeze Mountain), Duhchuanling (Cuckoo Ridge) and Kuandaushan (Big-knife Mountain) trail in Huisun, Sihwufongshan elevation from 680 m to 840 m, the highest elevation of Duhchuanling approximately 810 m, and Kuandaushan elevation approximately 760 m.According to the climate record of this study area, the annual mean temperature is 21.0 ; the monthly mean ℃ temperature highest is 30.6 in July, lowest is 20.5 in January; mean annual ℃ ℃ precipitation 2453.5 mm, the average relative humidity is 85%.Tong-Mao Mountain is situated at geographic coordinate 24°11'N latitude and 120°57' E longitude, near the Ta-chia River and Tong-Mao River, 10 km farther north from Huisun area.The elevation of Tong-Mao Mountain rises to 1690 m above sea level.According to the climate record of forest district office website, the annual mean temperature is 22.6℃; mean temperature highest is 29℃ in July, lowest is 15℃ in January; mean annual precipitation 2580 mm.The mountain has rich ecological resources cycad-fern (CF), Blechnaceae family, is only found in mountains in central Taiwan, such as Huisun and Tong-Mao Mountain areas, and Huisun is the main habitat.Because of its limited ecological range, cycad-fern has been categorized as one of the rare, endangered species (Lu et al, 1986).
Figure 1 Location map of the study area.

Data Collection
We collected digital elevation model (DEM) with grid size 5 × 5 m, orthophoto base maps (1:10,000), and nine-date SPOT images (SPOT Image Copyright 2004 and2005, CNES).In situ data (cycad-fern samples) were also acquired by using a GPS linked with an expandable antenna rod of 5m in length and a laser range finder, the error was usually below one meter after post-processing differential correction.Two-date SPOT images (07/10/2004 and 11/11/2005) were chosen because they have the best quality with the least amount of clouds among the nine-date SPOT images.

Data Processing
Slope and aspect data layers were generated from 5 × 5 m DEM.The ridges and valleys in the study area were used together with DEM to generate terrain position layer.The main ridges and valleys were directly interpreted from the contour lines shown on the orthophoto base maps; these lines were then digitized to establish the data layer by using ARC/INFO software for later use.The relative position (P ij ) of the test cell in the terrain is expressed as follows: (1) PV = Euclidean distance from P to the nearest valley line.PR = Euclidean distance from P to the nearest ridge line.P ij = 0.0 , terrain position is assigned to be "valley".P ij = 1.0 , terrain position is assigned to be "ridge".
The data layer in a vector format was converted into a new data layer in a raster format by ERDAS Imagine software, and then combined with DEM to generate terrain position layer (Skidmore, 1990).Vegetation indices were derived from the two-date SPOT-5 images, one in autumn, the other in summer, by using Spatial Modeler of ERDAS Imagine.CF samples obtained by GPS were converted into ArcView shapefile format for later use.
There were 221 CF samples collected from Sihwufongshan, Duhchuanling and Kuandaushan-trail in Huisun forest station and one site at Tong-Mao Mountain by GPS in this study, but a part of these samples remained after data integration because some cycad-ferns fall within the same pixels with others, respectively.Five sampling schemes (SS), from SS-1 to SS-5, were created with different combinations of cycad-fern samples collected from the four sites.data-merged model construction and the second subset containing the remaining (1/3) for model evaluation.

Database Building
The GIS database used in the study was constructed by using ERDAS Imagine software module Layer Stack to overlay elevation, slope, aspect, terrain position, and vegetation index layers.The cycad-fern sample layer was overlaid with five data layers, and those pixels of the five layers lying at the same position with the cycad-fern pixels were clipped out.To build statistical models, the sample data for both target groups (cycad-fern) and non-target groups (background) were taken from data layers by the random sampling to minimize spatial autocorrelation in the independent variables (Pereira and Itami, 1991).Because non-target sites (background) correspond to the vast majority of the study area, larger variation is expected in environmental characteristics for this group.The number of non-target pixels (sites) should be three times more than that of target pixels to increase the probability of acquiring a more representative sample of the habitat characteristics at non-target sites (Pereira and Itami, 1991;Sperduto and Congalton, 1996).

Model Development
The predictive models for selecting potential habitat of CFs were created using four statistical methods: (1) maximum entropy (MAXENT), (2) genetic algorithm for rule-set prediction (GARP), (3) generalized linear models (GLM), and (4) discriminant analysis (DA).Model development and validation can be done by split-sample validation approach.Split-sample validation approach can be implemented via dividing a dataset into two subsets, the first one (training data) typically comprising one-half to two-thirds of all data and the other (test data) comprising one-third to one-half of all data.
The first one is used to build and test a model.The other one (an independent dataset) is just used to test the model, not used to build the model.

Maximum Entropy
MAXENT is a general-purpose method for making predictions or inferences from incomplete information (Pearson et al., 2007).In estimating the unknown probability distribution defining a species' distribution across a study area, MAXENT formalizes the principle that the estimated distribution must agree with everything that is known (or inferred from the environmental conditions at the occurrence localities) but should avoid placing any unfounded constraints.The approach is thus to find the probability distribution of maximum entropy-that which is closest to uniform-subject to constraints imposed by the information available regarding the observed distribution of the species and environmental conditions across the study area.
MAXENT needs species-presence data and does not need species absence or pseudo-absence data per se, but distinguishes between species presences and random points from a background area using a probability distribution.MAXENT offers many advantages and a few drawbacks; the advantages include the following: (1) It needs only presence data, together with environmental information for the study area.(2) It can use both continuous and categorical data, and can incorporate interactions between different variables.(3) Efficient deterministic algorithms have been developed that are ensured to converge to the optimal (maximum entropy) probability distribution.(4) The MAXENT probability distribution has a clear mathematical definition, and is therefore suitable to analysis (Phillips et al., 2006).

Genetic Algorithm for Rule-set Prediction
GARP has recently seen an extensive use only in recent studies.It seeks a collection of rules that together produce a binary prediction (Phillips et al., 2006).GARP uses a set of point position records of species presence and a set of environmental layers that might limit the species' capabilities to survive.The model will use genetic algorithm to search heuristically for a good rule-set.There are four rules available currently in GARP software (DesktopGARP): atomic, logistic regression, bioclimatic envelope, and negated bioclimatic envelope rules, it uses the rules to search the correlation between species presence and absence and environmental variables for predicting suitable conditions for each pixel (Stockwell and Noble, 1992).It repeats times of statistical calculation based on runs set by user, and each of runs would generate a predictive distribution map.The GARP algorithm starts by inputting an initial set of rules generated by the initial program (Stockwell and Peters 1999).The first step in the GARP iterative loop is to select a data set by randomly sampling half the available data.The next step is to evaluate the rules on the sampled data.

Generalized Linear Models
GLM is an extended version of linear models that do not force data into unnatural scales, allow for non-linearity and non-constant variance in the data.GLM has an assumed relationship between the mean of the response variable and the linear combination of the explanatory variables.GLM is more flexible and better fitted for analyzing ecological relationships.(Guisan et al., 2002) The assumptions above are implicit in OLS regression.In GLMs, the predictor variables Xj (j=1,…,p) are combined to produce a linear predictor LP which is related to the expected value µ = E(Y) of the response variable Y through a link function g() : where α is a constant called the intercept X=/(X 1 ,…,X p ) is a vector of p predictor variables β=/{β 1 ,...,β p } is the vector of p regression coefficients (one for each predictor) We have written the model for generic variables X and Y; the corresponding terms for the i th observation in the sample is:

Discriminant Analysis
DA is a technique, which discriminates among k classes (objects) based on a set of independent or predictor variables.
The objectives of DA are to (1) find linear composites of n independent variables which maximize among-groups to within-groups variability; (2) test if the group centroids of the k dependent classes are different; (3) determine which of the n independent variables contribute significantly to class discrimination; and (4) assign unclassified or "new" observations to one of k classes (Lowell, 1991).The variates for a discriminant analysis, also known as the discriminant function takes the following form: where

Model validation
Model validation (evaluation) can be done by split-sample validation, as mentioned previously.For each model, predict the response of the remaining data, and calculate the error from the predictions and the observed values (De'ath and Fabricius, 2000).We also used overall accuracy and kappa coefficient to assess models, because overall accuracy only include the data along the major diagonal and excludes the errors of omission and commission, kappa incorporates the non-diagonal elements of the error matrix as a product of the row and column marginal (Lillesand et al., 2008).

Results and Discussion
For the base models shown in Table 1, the accuracy of MAXENT (kappa value 0.84) was the best in SS-1, followed by GLM (0.7) and GARP (0.6), and DA (0.55) was the worst.The kappa values of non-parametric algorithms, MAXENT (0.46) and GARP (0.12) in SS-2, dropped sharply, while parametric GLM (0.7) and DA (0.55) dropped slightly in SS-2 as tested by independent samples from the Kuandaushan-trail, with 076 km away from aforementioned two training sites in Huisun.For the first data-merged models in SS-3, the kappa values of four models lifted back to almost the same values as those in SS-1 from SS-2 or even better, and the four models still kept the same order in accuracy as that in SS-1.As the first data-merged models built in SS-3 were applied to a larger area in SS-4 including Tong-Mao Mountain, with 10 km away from the three sites at Huisun, the kappa values of MAXENT and DA declined to near zero, as well as GARP and GLM could not work possibly due to a limit on the size of data layer, a big difference in the domain values of predictor variables between Huisun and Tong Mao, or some other possible unknown factors which we will figure out later.In contrast, the kappa value of MAXENT in SS-5 rebounded strikingly as the second data-merged models built in SS-5 were applied to the same area as that in SS-4 (Table 2), while that of DA rose back slightly.Consequently, it was unlikely to accurately extend spatial patterns of CFs from the Huisun area to Tong-Mao Mountain area with 10 km gap or to the entire study area encompassing Huisun by predictive models merely based on topographic (indirectly operating) variables.
The models, either base models in SS-1 or the first data-merged models in SS-3, accurately predicted the potential habitats of CFs in Huisun, and substantially reduced the area of field survey to less than 10% of the entire study area, even less than 2.5% with MAXENT (Tables 3 and 4 and Figure 2).In Huisun study area, all the potential CF habitats predicted occurred in the Kuan-Dau watershed, and none occurred in the Tong-Feng watershed because of remarkable differences in humidity and solar illumination between them.The outcome had been proved true by field surveys through which almost no cycad-ferns were found in the Tong-Feng watershed.In contrast, neither the first data-merged models in SS-3 nor the second data-merged models in SS-5 could not accurately extrapolated CF spatial patterns when they were applied to the larger area encompassing Ton Mao Mountain.Consequently, they could not reduce the area of field survey to less than 10% of the entire study area, even greater than 25% with DA (Tables 5 and 6 and Figure 3 2 SS-4 and SS-5: the accuracies of the models with elevation, slope, and aspect variables for predicting the potential habitat of CFs.

CONCLUSIONS
The study developed the four models that related the known CF sites to habitat characteristics and predicted the plant's potential sites in the study area.For the case of Huisun area, the four models accurately predicted the potential habitats of CFs in Huisun, and substantially reduced the area of field survey to less than 10% of the Huisun study area, and were implemented efficiently.As a result, they were well suited for spatial distribution modeling of CFs.MAXENT was the best because it had highest accuracy and reliability among them.More importantly, they can prioritize either the field-survey areas where it is viable to collect fine spatial-resolution microclimatic, edaphic, or biotic data for refining predictions of potential habitat in the later rounds of SDM or search areas for new population discovery under the conditions of limited funding and manpower.However, the outcome showed that it is unlikely to accurately extrapolate the spatial patterns of CFs from one area to another area with a big separation or to a larger area encompassing the original one by predictive models merely based on topographic variables, as in the case of our entire study area.
Follow-up studies will attempt to incorporate proxy indicators that can be extracted from hyperspectral images or LIDAR DEM and substitute for direct parameters, and so that predictive models are applicable on a broader scale.
(A) SS-1, use two-thirds (2/3) of Sihwufongshan and Duhchuanling dataset for base model construction and the remaining (1/3) for model validation (evaluation).(B) SS-2, use the same base model built in SS-1 and only use independent samples taken from Kuandaushan-trail for base model evaluation.(C) SS-3, merge the samples from three sites in Huisun and then separate the dataset into two subsets, subset-1 containing two-thirds of the dataset for first data-merged model construction and subset-2 containing the remaining (1/3) for model evaluation.(D) SS-4, use the first data-merged model built in SS-3 and only use independent samples from Tong-Mao Mountain for model evaluation.(E) SS-5, merge aforementioned four-site samples and separate the dataset into two subsets, the first subset containing two-thirds of the dataset for the second International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B7, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia ). SS-1, SS-2, SS-3: the accuracies of the models with elevation, slope, and terrain position variables for predicting the potential habitat of CFs.