COMPARISON OF BIOPHYSICAL AND SATELLITE PREDICTORS FOR WHEAT YIELD FORECASTING IN UKRAINE

Winter wheat crop yield forecasting at national, regional and local scales is an extremely important task. This paper aims at assessing the efficiency (in terms of prediction error minimization) of satellite and biophysical model based predictors assimilation into winter wheat crop yield forecasting models at different scales (region, county and field) for one of the regions in central part of Ukraine. Vegetation index NDVI, as well as different biophysical parameters (LAI and fAPAR) derived from satellite data and WOFOST crop growth model are considered as predictors of winter wheat crop yield forecasting model. Due to very short time series of reliable statistics (since 2000) we consider single factor linear regression. It is shown that biophysical parameters (fAPAR and LAI) are more preferable to be used as predictors in crop yield forecasting regression models at each scale. Correspondent models possess much better statistical properties and are more reliable than NDVI based model. The most accurate result in current study has been obtained for LAI values derived from SPOT-VGT (at 1 km resolution) on county level. At field level, a regression model based on satellite derived LAI significantly outperforms the one based on LAI simulated with WOFOST.


INTRODUCTION
Crop yield forecasting is one of the main components of agriculture monitoring and an extremely important input in enabling food security and sustainable development (Kussul et al., 2011(Kussul et al., , 2010b;;Skakun et. al., 2014Skakun et. al., , 2015)).Providing timely and reliable crop yield forecasts is equally important at global, national and regional (local) scales.Currently, there are several operational systems providing crop yield forecasts at global scale.These are Global Information and Early Warning System (GIEWS) by FAO, National Agricultural Statistical Service (NASS) and Foreign Agricultural Service (FAS) by USDA, CropWatch by Chinese Academy of Sciences, and MARS Crop Yield Forecasting System (MCYFS) by EC-JRC (López-Lozano et al., 2015).These systems are being built using a wide range of data sets and models including remote sensing data, meteorological observations and crop growth models.
The use of remote sensing data from space for crop yield forecasting is motivated by wide coverage, near-real time delivery of data and products, and ability to provide different vegetation indicators.Many studies have shown that forecasting models based on remote sensing data can give similar or better performance comparing to the more sophisticated crop growth models (Gallego et al., 2012, Kogan et al., 2013a, 2013b;Kowalik et al., 2014).Usually, remote sensing derived indicators are connected to crop yield using empirical regression-based models.Traditionally, vegetation indices such as Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), and Vegetation Health Index (VHI) are used as input parameters into empirical models (Becker-Reshef et al., 2010;Franch et al., 2015;Kogan et al., 2013a;Kowalik et al., 2014;Salazar et al., 2008).Recently, however, more attention has been brought to the usage of biophysical parameters such as leaf area index (LAI) and fraction of absorbed photosynthetically active radiation (fAPAR) (Camacho et al., 2013;Shelestov et al., 2015).It is stated that biophysical parameters more adequately reflect the state of the crops and thus could be better suited for predicting crop yield and production (Duveiller et al., 2013;Kussul et al., 2014;López-Lozano et al., 2015).López-Lozano et al. (2015) use accumulated over optimal time period FAPAR values to predict yield for wheat, barley and maize for European Union and neighboring countries.They find that FAPAR is strongly correlated (R 2 >0.6) with yield for all three crop types for water constrained countries.(Kussul et al., 2014) compare FAPAR, NDVI and VHI for winter wheat yield forecasting in Ukraine.They find that performance of empirical regression models based on satellite data with biophysical variables (such as FAPAR) is approximately 20% more accurate comparing to the NDVI approach when producing winter wheat yield forecasts at oblast level in Ukraine 2-3 months prior to harvest.(Duveiller et al., 2013) use FAPAR parameter for sugarcane yield prediction in Brazil.They achieve yield estimation accuracy of around 1.5 t/ha without considering the trend and about 0.6 t/ha when the trend is taken into account.
It should be however noted that in many studies satellite data from space are used at global or national scales.In our previous study, we have estimated efficiency of using predictors of different nature (vegetation indices, biophysical parameters, and a crop growth model adopted for the territory of Ukraine) at oblast level (Kogan et al., 2013a(Kogan et al., , 2013b;;Kussul et al., 2013;Kussul et al., 2014).No previous studies assessed efficiency of satellite-derived indicators at multiple scales.This paper is aimed at addressing this gap.This paper is devoted to winter wheat yield forecasting problem in Ukraine at different scales.Since reliable statistical data are available for Ukraine only since 2000, we use a single factor regression model for crop yield estimation.Our previous study have demonstrated the effect of overfitting when more complex models are used (Kogan et al., 2013b).The main goal of the paper is to determine the best predictors for regression models at different scales among satellite and biophysical model (WOFOST) input parameters.We consider three levels of investigation for the best predictor selection in yield forecasting problems: region, county and fields of concrete farm.Oblast is a sub-national administrative unit that corresponds to the NUTS2 level of the Nomenclature of Territorial Units for Statistics (NUTS) of the European Union, county corresponds to NUTS3 level.
Therefore, the objective of the study presented in this paper is to assess the efficiency (in terms of prediction error minimization) of satellite and biophysical model based predictors assimilation into winter wheat crop yield forecasting models at different scales (region, county and field).

STUDY AREA AND DATA DESCRIPTION
A study area in Onufriivka county of Kirovohrad region has been selected for winter wheat forecasting (Figure 1).The region was selected for investigation for several reasons.First, it is one of our points of interest within SIGMA project (Lavreniuk et al., 2015) situated not far from one of the agrometeorological stations, for which we have calibrated WOFOST model and gathered correspondent time series of meteorological and phenological parameters since 2000.Second, there are fields of agriculture enterprise "Veres" at the county, for which agronomical data were collected since 2010 year.For example, Figure 1 shows location of "Veres" enterprise fields, where winter wheat was grown during one year (shown in yellow), two years (green) and three years (red) since 2010.Therefore, ground measurements for this territory at different scales were available to build and compare winter wheat forecasting models.The following satellite-based predictors for empirical regression crop yield models are used in the study: 16-day NDVI composites derived from Moderate Resolution Imaging Spectroradiometer (MODIS) at 250 m spatial resolution, LAI and FAPAR composites from SPOT-Vegetation at 1 km spatial resolution.At the field level, we will compare the efficiency of using as a predicator LAI values derived from satellites images and WOFOST crop growth model that was adapted and calibrated for the study area.
In our previous study we have compared NDVI MOD13Q1 composites and FAPAR derived from SPOT-Vegetation as a predictors for crop yield estimation for Kirovohrad region (Kussul et al., 2014).We have demonstrated that the most informative NDVI predictors are from the last decad of April, and the most informative values of FAPAR are from the last decad of May.Since traditional harvest time for winter wheat in Ukraine is July, the use of such kinds of predictors allows us to forecast the crop yield 1-2 months before the harvest.

SATELLITE PRODUCTS DESCRIPTION
LAI and FAPAR, used in this study, are free of charge SPOT Vegetation products, which were obtained from Copernicus Global Land Service (http://land.copernicus.eu).These products are modeled data taken by processing the SPOT Vegetation (SPOT-VGT) satellite imagery.Their temporal coverage includes period from Dec 1998 to May 2014, and its spatial resolution is 1 km.
The MODIS product MOD13Q1 contains two vegetation indices -NDVI and EVI, which are computed from atmospherically corrected surface reflectance (SR), that have been masked for water and atmospheric effects, i.e. clouds, cloud shadows and heavy aerosols.It is provided for every 16 days at 250-meter spatial resolution.In our study we used only NDVI satellite product, obtained from MOD13Q1.

RESULT ANALYSIS
Yield is estimated as a sum (1) of trend component (linear trend in yield time series is present (2)) and deviation from trend, caused by current situation with vegetation development: At the same time deviation is estimated with a linear singlefactor regression model where sat_data i -is information feature (predictor) for year i in available time series.According to Table 2, biophysical products (fAPAR and LAI) are more preferable to be used as predictors in crop yield forecasting regression models.Correspondent models possess much better statistical properties and are more reliable than the NDVI based model.The most accurate result in current study has been obtained for LAI values derived from SPOT-VGT (at 1 km resolution) on county scale averaged using the crop mask, derived from 30 m land cover map (Lavreniuk et.al, 2015).
For region scale models we have compared the efficiency of two different crop masks: GlobCover-2009 at 300 m resolution (Arino et. al, 2012) and the crop mask, derived from developed by Space Research Institute NASU-NSAU landcover map with 30 meter resolution (Lavreniuk et.al, 2015) with resolution of 30m.Statistical properties of the models in both cases are nearly the same (Table 3).It means that considered crop masks are consistent enough, and higher accuracy of crop yield estimation could be reached using dynamic crop mask, based on early season crop classification.

DISCUSSION AND CONCLUSIONS
In the paper the problem of winter wheat crop yield prediction is considered at different scales, namely NUTS2, NUTS3 and field level for the territory of Kirovohrad region of Ukraine.The study area is selected for several reasons: the region is one of the main producers of winter wheat in Ukraine, it is situated at the central part of Ukraine with typical climatic conditions, within SIGMA project we have collected soil, meteorological and phenological data at the county level, and field level agronomic and statistical data are available for the territory.So, we have all necessary information for multi scale winter wheat crop yield modelling.Since reliable statistics in Ukraine is available since 2000 year, we use a single factor linear regression model for crop yield estimation to avoid overfitting.We have investigated 3 kinds of satellite based predictors (NDVI with 250 meter resolution, fAPAR and LAI at 1 km resolution) averaged within static crop mask at three different scales.As a crop mask we used GlobCover and a 30 meter crop mask, based on LandCover map, created within SIGMA project (Lavreniuk et al., 2015).Both crop masks provide quite good results with very similar statistical properties, while higher accuracy of forecasting could be reached with dynamic crop masks.We plan to implement this approach in our future work with use of high-performance computations (Kussul et al., 2009(Kussul et al., , 2010a(Kussul et al., , 2012;;Kravchenko et al., 2008, Shelestov et al., 2006).The regression model with the best statistical properties is received for county level when satellite based biophysical predictors (FAPAR or LAI) are used.The results are consistent to our previous study and recent results for European territory (López-Lozano et al., 2015).
At field scale, we also have considered as a regression model based LAI simulated with WOFOST biophysical model.But statistical properties of the regression are much poorer than with satellite based predictors.RMSE error of the crop yield prediction on model based LAI is at least twice higher than for satellite predictors.So we can conclude that model is not calibrated well enough, and do not simulate the real state of the vegetation.For its adequate calibration much more phenological, agronomic and local meteorological data is required.As a rule, such data are missing for agricultural companies or databases contain only very limited amount of data.So, in the nearest future satellite based predictors are expected to be more preferable for use in regression models for crop yield estimation.

Figure 1 .
Figure 1.The area of investigation at different scales: Kirovohrad region, Onufriivka county and fields with winter, grown for 1 (yellow), 2 (green) and 3 (red) years since 2010 So in our present study we considered several possible predictors derived for the last decad of April till end of May for each scale level: -LAI and FAPAR (SPOT-Vegetation, 1 km resolution), NDVI (MODIS, 250 m resolution), averaged at region level using a 300 m resolution GlobCover map and built within the SIGMA project 30-meter LandCover (Lavrenuik et al., 2015) crop mask; -LAI and FAPAR (SPOT-Vegetation, 1 km resolution), NDVI (MODIS, 250 m resolution), averaged at county level using a 300 m resolution GlobCover map and built within the SIGMA project 30-meter LandCover (Lavrenuik et al., 2015) crop mask; -LAI and FAPAR (SPOT-Vegetation, 1 km resolution), NDVI (MODIS, 250 m resolution) and modeled LAI from WOFOST model averaged at field level.

Table 1 .
In this study we consider as predictors such bioparameters as satellite derived LAI and FAPAR averaged on decad base (3 times per month) and 16-days NDVI index composites containing best pixels for 16-day period of time for cropland in borders of region, county and field.Also as predictor we consider LAI time series samples generated from WOFOST model calibrated for generalized winter wheat field at test site location (field level) with the same time resolution as satellite derived LAI product.Crop yield statistics and field level crop yield for winter wheat Vegetation parameters are averaged within corresponding administrative units using the crop mask with 30-meter resolution, derived from the land cover map developed by SRI within SIGMA project and field borders for field level.
Figure 2. Official statistics on winter wheat crop yield for region, county scale and field based crop yield Figure 3. Linear dependency between county level statistics and field level crop yield dataVery good linear correlation (R 2 =0.99) between county level statistics and field level crop yield data allows us to restore missing observations at field level (Table1) for 2000-2009 and use it for a field level model identification.Trend analysis for official statistics on winter wheat crop yield for different scales is shown in Figure2.County level statistics is absent for exceptionally dry 2003 year.Field level data are available only since 2010.The trends for region (solid line) and county (dashed line) scales are very similar.Field level trend is not representative, because we have only 4 points of observations.But linear dependency between winter wheat crop yield at field and county scale is observed (Figure3).

Table 4 .
Table4).For the most important part of winter wheat vegetation period for considered regionfrom the end of April till the end of May -p-val for such predictor grows significantly from 0.03 to 0.7 which means that modeled LAI is meaningless addition to the yield forecasting model with current level of ground data measurements by farmers.F-statistics is less then critical onecritical level of F-statistics with λ=0.01 is 9.07 for model with 1 predictor at 13 samples.RMSE error of the crop yield prediction on model based LAI obtained within LOOCV-procedure is at least twice higher than for satellite predictors.So we can conclude that model is not calibrated well enough, and do not simulate the real state of the vegetation.For its adequate calibration much more phenological, agronomic and local meteorological data is required.As a rule such data are missing for agricultural companies or databases contain only very limited amount of data.So, in the nearest future satellite based predictors are expected to be more preferable for use in regression models for crop yield estimation. Figure.4. LAI values derived from SPOT-VGT and WOFOST model at the field level The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-7/W3, 2015 36th International Symposium on Remote Sensing of Environment, 11-15 May 2015, Berlin, Germany Statistical properties of crop yield regression models with model based and satellite predictors at the field level