MAPPING OF ALGAE RICHNESS USING SPATIAL DATA INTERPOLATION

This work describes the generation of a database of algal species richness at two spatial scales – regional (Gulf of Mexico and the Caribbean) and global (coastal zones). As a first approach to the definition of the temporal variability, and to produce the corresponding maps, a previously published decision tree is used in order to select the best spatial interpolation technique according to the characteristics of the spatial data. The methods presented are ordinary Kriging (since no relationship exists with any environmental variable that could function as an external variable) and inverse distance squared, for comparative purposes. The methods to generate the spatial layers are evaluated using the leave-one-out cross validation technique. Although the evaluation did not find a large correspondence (in terms of linear regression) between the interpolated and measured values, it was possible to capture the spatial variability of the process and produce the cartography of this variable, with which future ecological analyses can be performed..


Introduction
Species richness can be considered to be the simplest way to measure biodiversity (Krebs, 1978) and the results can be analyzed using several methods.One such method is the generation of maps, which have been important to describe both populations and species found in habitats and areas.Maps have also contributed to interpreting heterogeneous data related to the specific presence along diverse environmental gradients.Marine algae are no exception to this approach, since the mapping of this parameter has greatly helped to identify the possible environmental and historical factors that explain the patterns of algal diversity both regionally and globally.Investigations about the mapping of species richness are scarce and have been dedicated to identifying variations in the measurements of diversity based on an environmental gradient (eg.latitudinal gradient) (Santelices & Marquet, 1998).Meanwhile, the development of geographic information systems has led to the implementation of certain predictors in order to develop algal diversity maps.One method which has been used to predict richness is to determine whether a correlation exists between environmental variables and macroalgae (Keith et al 2014).Another important method to generate maps of the distribution of algal diversity is to obtain the distribution intervals of algae genera or species and thereby determine the value of the total number of intervals that are overlapped in each quadrant or site, using the inverse distance weighted interpolator (IDW) to estimate this parameter (Kerswell, 2006).The classical statistical methods that have been used to estimate species richness ignore the fact that the spatial distribution of species richness presents a continuity and a dependency pattern at the spatial level.This is due to the presumptions about stationarity in space and time, independency among data and the identical distribution of the parameters.Nevertheless, these presumptions are not always accurate.Geostatistical methods have been applied to these types of cases, which consider dependency among contiguous geographic units based on the value of the species richness (Crúz-Cárdenas et al. 2013).
Therefore, even though some methodological approaches exist to map patterns of algal species richness, alternatives that are based on different principles are needed.One example is the kriging method pertaining to geostatistics, given that the application of geographic information systems to the study of the diversity of these organisms is fully developed.One of the ways in which this tool is useful to biology is to identify areas that are of interest because of their high or low diversity.As a consequence of the identification of hotspots present in the maps generated, potential areas for the conservation of species or endemism can be evaluated (Escalante, 2003), as well as the potential distribution of a particular species.Important information can also be obtained to determine the coastal zones that need a better evaluation of algal diversity.It is worth mentioning another factor to be considered when conducting bio-geographical investigations-the distribution patterns of species richness (Morrone 2009).It is important to mention that these methods may operate differently depending on the geographic scale and the taxonomic level of the study (Kerswell, 2006).Thus, the scales and biological models used in the present work are: (1) on a global scale, the Laurencia genus which includes the red algal species and presents a tropical, subtropical and template distribution interval (Sentíes & Fujii, 2002); and (2) on a regional scale, large macroalgae on the coasts of the Gulf of Mexico and the Mexican Caribbean (GMC), which is a biological group that may represent a potentially important resource for the country (Dreckmann & Sentíes, 2013).
The use of ordinary kriging (OK) enables taking into account local variations in the mean, limiting the stationarity domain to a local area Ω around the position x where the variable will be estimated.
The expression Z ( x )=Y ( x)+m( x ) represents a stochastic process with a variable mean m( x ) and covariance function C (h ) .As such, Y ( x) is a stochastic process with a null mean.A linear estimator is a linear combination of the measurements If the mean is constant in domain Ω , then the above equation can be eliminated by forcing the sum of the kriging weights λ k to equal 1.In such a case, the estimator is called ordinary kriging and is expressed as: The optimal weights that minimize the variance of the estimation error are obtained using the Lagrange multiplier method (Goovaerts, 1997, pp. 133), which results in the following system of equations: This system can be written as: )

Method
The species richness values of the Laurencia genus were obtained at the global level by consulting the primary literature that has reported on the species of this genus (eg.taxonomic monographs, floristics catalogs, species checklist and isolated records) as well as the AlgaeBase online (Guiry & Guiry 2015).The sites registered were georeferenced and the taxonomic validity of the species was verified.The compilation of these reports resulted in the specific richness value (number of species) for each one of the localities.A total of 130 species were reported in a total of 501 localities worldwide, with values from 1 to 29 per locality.
With respect to the regional scale (GMS), a checklist was made of the large macroalgae species (macroalgae sizes 10 to 100 cm) present in the study area.These sizes were selected based on the Littler & Littler (2000) identification guide.A database was built with these data and the localities where these organisms have been reported, their georeferencing and the specific richness of each one.This process was primarily based on specialized catalogues of the study region produced by Ortega et al. (2002) and Dreckmann (1998) and complemented with information from registries reported in the Algaebase database (Guiry & Guiry, 2015) and various recent publications.A total of 110 species were included in the database, which represented large macroalgae with valid taxonomies pertaining to the classes Rhodophyta, Chloropyta and Ochrophyta.They were distributed throughout the 118 localities and their values ranged from 1 to 87 species in each one.
A decision tree was used to select the spatial analysis technique (Hengl, 2009).First, a determination was made as to whether the variable of interest had a linear relationship with an environmental variable.The variables tested included surface temperature obtained from a MODIS sensor.When no relationship was found, an interpolation procedure was performed using ordinary kriging, since it was possible to estimate a semivariogram of the empirical data.A mechanical interpolation method was also performed using the inverse distance squared method for comparative purposes.The results were evaluated with a leave-one-out cross validation.

Results and Discussion
Next, the results from the two exercises performed at different scales are presented.As mentioned, the decision tree model by Hengl ( 2009) was followed.No significant correlation was found between the species richness data corresponding to the Laurencia genus and marine surface temperature (annual, January and July).The same findings were obtained with the species richness data for the GMC region.Nevertheless, when fitting the semivariogram to an experimental model (Figures 1a, 1b), a spatial correlation of the data was identified for both cases.The cross validation analysis of the values obtained by the two interpolation methods showed a significant correlation (Table 1) between the known and predicted richness values.Although the values of R2 were low for a spatial analysis (Table 1) they were similar for the two interpolators at both scales, having achieved a good capture of the spatial variability of the species richness.This can be seen in the definition of the richness pattern in the maps generated (Fig. 2,3).
With the spatial autocorrelation according to the decision tree by Hengl (2009), it would have been necessary to estimate the species richness using the OK interpolation.Nevertheless, the results were very similar when applying the IDW method.At both the global and regional levels, the two methods also identified the same locations as having the largest number of species (Fig. 2,3).
One difference between the results from the two interpolators is seen in the estimated size of the areas that contain the highest species richness.For example, for the GMC, the area estimated using the IDW interpolator was larger than with the OK (Fig. 3a, 3b), whereas the opposite occurred with the interpolation at the global scale (Fig. 2a,2b).For the GMS, differences in the IDW and the OK estimations of low species richness also exist, in which the former estimated large areas with values near zero (Fig. 3b) and the latter resulted in values near the middle interval (8 -15 species) for these areas (Fig. 3a).This may indicate a possible underestimation of the diversity on the part of the IDW and an overestimation by the OK.
Did the analyses using OK and IDW function in this study, even though the values were low for a spatial analysis?The answer to this question is "yes" given that the maps obtained adequately represent distribution patterns that can be interpreted by ecological (environmental) and historical (geological) factors.Nonetheless, it is necessary to take into account that a characteristic of the distribution of these organisms is that they inhabit coastal zones, which can influence the analysis and indicates that this distribution is represented more by a coastline than by an area.This disadvantage can be added to the discussion about the geographic unit used in the spatial analysis of diversity.According to Murguía (2005), the scale can create a problem for defining the geographic unit.In light of this study, it can be said that the distribution of the organisms is highly important to the definition of the geographic unit.

Conclusions
The estimators were determined to be significant for both scales and interpolators (IDW and OK).They were similar in terms of capturing the distribution of the species richness.Therefore, in this case, OK was as effective as IDW.Geostatistical methods can continue to be tested by studies similar to the present work through improvements such as generating a more detailed database of algae richness.Other environmental variables can also be tested that may be associated with the variable of interest, such as: phosphate concentrations, turbidity, salinity, pH, nitrates, oxygen levels and dissolved carbon dioxide levels, type of sediment and calcite concentrations.It is very important to evaluate correlations with these variables based on temporal scales or average levels.It is worth mentioning that the design of the analysis should not only consider the scale of the geographic unit but also the way in which the organisms are distributed in an area.Furthermore, how the biological model in question is distributed must also be considered since they can be spread over areas or along coastlines, as in the case of microalgae.Continuing to perform floristic studies will enable improving estimations of the spatial variability of the species richness of these organisms.In turn, this will contribute to future bio-geographic studies and will serve as a useful tool to make decisions about the conservation of algae diversity.

)
where μ denotes the Lagrange multiplier.Alternatively, if the relation between the covariance function and the semivariogram function γ ( h)

Figure 1 .
Figure 1.Semivariogram fitted to the total species richness for a) Laurencia genus and b) large macroalgae in the Gulf of Mexico and the Caribbean..

Figure 2 .
Figure 2. Spatial distribution maps of species richness for the Laurencia genus at the global level.A) Estimation with OK interpolator and B) with IDW.

Figure 2 .
Figure 2. Spatial distribution maps of species richness for large macroalgae in the Gulf of Mexico and the Mexican Caribbean.A) Estimation with OK interpolator and B) with IDW.