SPATIOTEMPORALLY IMPROVING THE RURAL ACCESS INDEX – A REMOTE SENSING BASED APPROACH

: Many countries, especially in the global south still lack the ability to effectively pursue basic policies, which can lead, in the worst case, to armed conflicts. Access to markets is a key factor for economic growth and an important component in reducing poverty. The SDG 9.1.1 addresses the proportion of the rural population who live within 2km of an all-season road, which can be mapped by the Rural Access Index (RAI), introduced by the World Bank in 2006. This requires the road network of so-called all-season roads, population distribution and rural areas. We developed a fully automated approach, using remote sensing and other open source data to calculate the RAI on an annual basis between 2013 and 2020 for the Lake Chad region. We achieved an overall accuracy between 97.0% and 97.5% in detecting all-season roads using a Random Forest classification. Our method shows similar results to those published by the World Bank. However, our approach provides a higher spatial and temporal resolution measuring the RAI compared to previous studies and is independent of field studies.


INTRODUCTION
Countries, which lack the ability to effectively pursue domestic or international policies are often at risk of becoming what is known as a 'fragile state'.To assess this risk, scientists use the concept of state capacity, a measure that is closely connected to the fragile state concept (Besley and Persson, 2010).It measures the ability of a state to e.g.ensure political stability, charge taxes and provide essential health or educational supplies to its population (Woodward, 2004, Müller-Crepon, 2021).One of these key factors, access to markets, is crucial to economic growth and an important component in reducing poverty.Especially the rural population is often insufficiently connected to a well-developed infrastructure.The Sustainable Development Goal (SDG) 9.1 also accounts for this factor and includes the development of "[...] reliable, sustainable and resilient infrastructure, including regional and transborder infrastructure, to support economic development and human wellbeing, with a focus on affordable and equitable access for all" (United Nations, 2022).The connection of rural population to infrastructure has been translated into the Rural Access Index (RAI) by Roberts et al., 2006 and estimates the proportion of the rural population with adequate access to the transport system, thus directly mapping SDG indicator 9.1.1(World Bank Group, 2016).Precisely, the RAI is defined as the proportion of the rural population living within two kilometers (∼ 20-25 minutes walking time) of an all-season road.All-season roads are defined as being navigable throughout the year and are only temporarily impassable due to weather conditions.The initial approach to calculate the RAI was based on household surveys (Roberts et al., 2006).In 2015, the World Bank designed a new GIS-based method to provide more accurate and cost-effective results by using specific datasets, i.e. the population distribution, road infrastructure and the condition of these roads (World Bank Group, 2016).According to the World Bank Group, 2016, roads are all-season roads if they are either paved (with an International Roughness Index (IRI) of less than 6 meters/km or in excellent, good or fair condition) or unpaved (with an IRI of less than 13 meters/km or in excellent or good condition).Such an assessment requires an on-site survey of the roads or most often a subjective assessment based on highresolution imagery, if available.For the RAI calculation, the World Bank proposes the use of different data sets, which are partly publicly available, others restricted, or, depending on the study area, proprietary.The RAI is made partially available for various years at country level, in some cases Administrative Level 1 (admin-1 level) (Workman andMcPherson, 2019, World Bank Group, 2019).Other studies calculated the RAI for only one point in time (Akin andDemirel, 2019, World Bank Group, 2016), a specific subregion (Akin and Demirel, 2019), only partly using remote sensing data (Akin and Demirel, 2019, World Bank Group, 2016, Workman and McPherson, 2019), or in a GIS environment (Workman andMcPherson, 2021, World Bank Group, 2016).Our approach consists of using a standardized objective data basis, including remote sensing data, in order to be able to calculate the RAI consistently across countries with a higher temporal and spatial resolution than provided to date.Only open source data is used as part of our framework, which allows researchers and analysts to gather and analyze large amounts of data without incurring significant costs.The OpenStreet-Map (OSM) database is used as basis for the road network.Since in 2013 about 72% and in 2020 still about 40% of the roads within our study area do not contain information about the road surface, a Random Forest classification was carried out on Landsat 8 and Sentinel-2 data for each year to identify paved pixels.For this paper, we focus on an annual calculation of the RAI between 2013 and 2020 in the Lake Chad region, which includes parts of Niger, Chad, Cameroon, and Nigeria.As spatial units we use both, admin-1 regions and PRIO-GRID cells therein, a global vector matrix commonly used in social sciences (Tollefsen et al., 2012).In summary, the study specifically addresses the following research questions: (I) Is the differentiation between paved and unpaved road pixels possible by conducting a land cover classification?(II) Can adequate RAI results be obtained using remote sensing and other open source data?(III) Is it possible to provide a higher temporal and spatial resolution of the RAI than previous studies show?

Study Area
For this paper, we focus on the Lake Chad region, which covers the cross-border region of Niger, Chad, Cameroon, and Nigeria.A total of eleven admin-1 regions are considered, one in Niger, two in Nigeria, two in Cameroon and six in Chad (Figure 1).All of these countries have been affected by the violent terrorism of Boko Haram, which has built its success on dysfunctional socioeconomic and political gaps (United Nations Development Programme, 2022).The conflict has resulted in more than 2.9 million people being displaced and according to UN OCHA, about 5.6 million people are directly facing food insecurity (UN OCHA, 2022).Insufficient basic social services and the already low availability of natural resources are increasingly deteriorating the situation.Due to years of dominating violence and insecurity, thousands of children lack education, estimations made by UN OCHA account for more than 1,000 closed schools (UN OCHA, 2022).The population derives its income mainly from agriculture and fishing (Masaki and Rodríguez-Castelán, 2021).In some cases, the poverty rate in the region is up to three times higher than in the rest of the neighbouring countries reaching values between 31% and 72% (Masaki and Rodríguez-Castelán, 2021).2012, Tollefsen et al., 2012,OpenStreetMap contributors, 2022; Service Layer Credits: Airbus,USGS,NGA,NASA,CGIAR, NCEAS,NLS,OS,NMA,Geodatastyrelsen,GSA,GSI and the GIS User Community; Esri,HERE,Garmin,FAO,NOAA,USGS)

Data
Only open source data is used as part of our framework.The gridded population count dataset from WorldPop, available annually since 2000, is taken as population distribution data (WorldPop and CIESIN, 2018).The used mapping approach is a Random Forest-based dasymetric redistribution (Stevens et al., 2015).This involves matching census-based population counts with the corresponding administrative units and disaggregating them to 3 arc (∼100x100m) grid cells using machine learning techniques that exploit relationships between population densities and a set of spatial covariates (Lloyd et al., 2019).As a second dataset, the Global Human Settlement Layer of the Joint Research Centre (JRC) from 2015 with a spatial resolution of 1km, is used to identify rural regions which is based on satellite and population data (Pesaresi et al., 2016).The dataset from 2015 was chosen because it is the closest to our time period.Only pixels that are not classified as urban centers or urban clusters according to the DEGURBA (Degree of Urbanisation) classification scheme and thus have a population density of less than 300 inhabitants per km 2 are considered in our study (OECD et al., 2021) The road network is obtained via OpenStreetMap (OSM) and queried for the corresponding, retrospective years through the ohsome API.This API is a data analysis tool, provided by the Heidelberg Institute for Geoinformation Technology (HeiGIT), for working with OSM historical data on a global scale.It allows to query and aggregate the recorded vector data and get insights into the spatial and temporal evolution and associated annotated attributes (Raifer et al., 2019).
Atmospherically corrected surface reflectance from Sentinel-2 MSI and Landsat 8 OLI/TIRS time series data is used for road classification.The data is provided along with a quality assessment (QA) band that indicates the presence of clouds (USGS, 2019, ESA, 2015).In addition, the ESA WorldCover 10m classification and the JRC Global Surface Water Mapping Layer are used to mask water and wetland areas from the satellite mosaics.The ESA WorldCover 10m provides a global land cover classification of 2020, based on Sentinel-1 and Sentinel-2 data.The classification consists of eleven land cover classes, one of which designates permanent water bodies (Zanaga et al., 2021).The JRC Global Surface Water Mapping Layer provides information about spatial and temporal distribution of surface water with a spatial resolution of 30 meters between 1984 and 2021, based on Landsat 5, 7 and 8 data.Both permanent and seasonal water surfaces are recorded (Pekel et al., 2016).
The calculations are performed both on admin-1 level (Global Administrative Areas, 2012), the largest subnational administrative unit of a country, and on PRIO-GRID, which is a global vector matrix of 0.5 x 0.5 decimal degrees spatial resolution and often used in social science research being insensitive of political boundaries (Tollefsen et al., 2012).

Random Forest and Road Classification
Within our approach we defined all season roads as roads that are paved to calculate the RAI and determine the condition of the roads.
Even though the literature refers to paved roads as all-weather roads rather than all-season roads which is rated to a higher standard (World Bank Group, 2016, Workman andMcPherson, 2019), our approach does not rely on time-consuming in situ data that must be collected annually.To acquire this information about the road surface, a yearly Random Forest classification was carried out on Landsat 8 (between 2013 and 2018) and Sentinel-2 (for 2019 and 2020) data, where paved pixels were identified.Figure 2A shows a schematic overview of the workflow.
As first pre-processing step, annual multitemporal composite mosaics are created based on the available imagery.Secondly, cloud masking is performed.The provided QA band contains The Normalized Difference Vegetation Index (NDVI) is used to highlight the signal from vegetation, which highly reflects in the NIR band (Kamal et al., 2015).The Modified Normalized Difference Water Index (MNDWI) helps to delineate water and artificial surfaces (e.g.asphalt roads).Both surfaces have similar spectral properties and reflect especially in the green wavelength range, the MIR band used improves the distinction between the two (Xu, 2007).The Normalized Difference Built-up Index (NDBI) is particularly useful in detecting builtup areas (Zha et al., 2003).The Burned Area Index (BAI) is additionally used to identify burn scars (Chuvieco et al., 2002).
As paved roads also have a lower spectral reflectance in the Red band than vegetation, this index also helps to distinguish asphalt from other land covers (Kamal et al., 2015).During the development of the method it has come to attention that burned areas were often classified as paved roads, which is prevented by incorporating the BAI.The indices are calculated as follows: After randomizing, the training collection is split into 70% training and 30% testing points.A 10-tree Random Forest classifier is then carried out, resulting in an overall training accuracy of 95.80% and overall testing accuracy of 69.44%.The classification is used as input for assigning the road surface on the annual OSM road network.A road was classified as being paved by a winner-takes-all approach, i.e. if most of the road pixels were classified as such.However, if a definite road surface attribute was given by an OSM user (e.g.asphalt, paved, concrete, bitumé, unpaved, sand, earth, dirt, gravel, pebblestone), this was taken, accounting for their expert-based knowledge (Figure 2B).

Calculating the Rural Access Index
The calculation of the RAI follows the approach suggested by the World Bank Group, 2019.The RAI represents the percentage of the rural population (RP ) living within 2km of an all-season road (RP road ): This is done by determining the sum of the rural population within the Area of Interest (AOI).Subsequently, all roads marked as all-season roads within the rural regions are provided with a buffer of 2km and the population is then summarized within these areas (Figure 2C).

Acccuracy Assessment
For the accuracy assessment, stratified random sampling was used to generate reference points for 2017 and 2019, evaluating the recorded all-season roads.These two years were chosen to include both Landsat 8 and Sentinel-2 classifications.All-season roads are defined here as those captured by the Random Forest classification and related OSM attributes, as described in chapter 2.3.1.As paved roads represent a rather small proportion in the study site compared to other land surfaces, the stratified random sampling method allows us to increase the proportion of the underrepresented paved class (Olofsson et al., 2014).Our requirement was to have at least 100 samples in each stratum, which could be ensured by a proportional redistribution of the sample size.In total 4,125 samples were distributed randomly throughout the whole study area for the two points in time, representing the strata paved and unpaved.The points were assigned expertbased to be either paved or unpaved using Google Earth high resolution imagery of the respective year.

Rural Access Index
The results are shown in Figure 3 and Figure 5 on admin-1 levels and in Figure 4 on PRIO-GRID cells.On average, the   (Chad), where population density is very low and the smallest increase was observed compared to other regions, from 0.00% to 1.53%.This becomes even more noticeable at PRIO-GRID levels where grid cells containing larger settlements (such as Biu in south-western Borno, Nigeria, Guélengdeng in central north Mayo-Kebbi Est, Chad or N'Djamena in Ville de N'Djamena, Chad) stand out clearly (RAI between 28% and 82% over time).The same applies to grid cells holding main road axes to larger nearby cities (such as PRIO-GRID cells in the southern part of Nord, Cameroon where Ngaoundéré is a nearby city).Similarly, it is notable that higher RAI values at admin-1 level often result from single higher values and a large number of unpopulated areas in PRIO-GRID cells contained therein (e.g.Diffa in Niger).In general, the RAI never exceeds 35% during the analysis period, with the exception of Ville de N'Djamena (Chad).
Statistics can be viewed in more detail in Table 2: A positive trend of 2.33% can be observed for the whole region regarding the rural populations' walkable access to all-season roads.
For the entire AOI, the value of RAI is 17.17%.Regions in Cameroon rank the lowest, with a mean value of 11.33%, while nigerian regions rank the highest, with 22.79%.Chad regions show the widest range of RAI values, from 0.00% to 81.01%, with the high values being attributed to Ville de N'Djamena.

Overall Classification Performance
The accuracy assessment in 2017 results in an overall accuracy of 97.0%, and the 2019 verification shows an overall accuracy of 97.5%.The Cohen's Kappa value is 61.6% for the year 2017 and 66.3% for 2019, with values between 61.0% and 80.0% reflecting high agreement (Landis and Koch, 1977).For the paved class, the precision is 54.0% in both years, which means a slight overestimation on paved roads.The recall shows accuracies of 76.1% and 90.0% for the years 2017 and 2019.The class unpaved roads has a precision of 99.1% and 99.7% for 2017 and 2019 and a recall of 97.7% in both years (see Table 3).By comparing our findings with the World Bank's results, which have been published for parts of our study area, we find an overall strong agreement.For 2014, we obtain a RAI of 15.5% for the admin-1 level of Borno (Nigeria), and 15.9% for Adamawa (Nigeria), the World Bank values of 14.9% and 12.8% correspond respectively (World Bank Group, 2019).

DISCUSSION
The main objective of this study was the application of an automated workflow using freely available remote sensing and geo data to calculate the Rural Access Index for admin-1 areas and PRIO-GRID cells for the Lake Chad region between 2013 and 2020.The comparison between the two spatially differing admin-1 level and PRIO-GRID regions has shown that the higher spatial resolution of the PRIO-GRID provides more accurate results with respect to the RAI.Thus, changes can be observed and analyzed more accurately and may serve as valuable input for decision makers.
Based on Landsat 8 and Sentinel-2 mosaics as well as the additional use of four spectral indices over the time period considered, a Random Forest classification was performed to detect paved pixels.Compared to in situ surveys, the remote sensing data used has the advantage of being available free of charge over a long period of time and of allowing an automated classification procedure.The method used is reproducible over different areas of interest, only the training samples would have to be adjusted, depending on the land use of the region.In general, the distinction between paved roads and other land uses (e.g.roofs, water) is difficult because the spectral properties are observed to be often similar (Xu, 2007, Kavzoglu et al., 2009).This could be solved by the initial masking of water areas and bringing in historical OSM road data serving as road database.
Due to local wind drifts, asphalt road sections may be covered with sand or vegetation and identified as such.In order to minimize such misclassifications, all OSM roads were assigned the attribute paved, either if the majority of pixels on a road were detected as asphalt or if a corresponding tag was provided in the OSM data itself.Given that OSM is an open user-generated content platform, the quality of the data provided is dependent on the OSM community and their know-how (Neis and Zielstra, 2014).Since the OSM roads serve as data basis for calculating the RAI, a possible mis-or non-editing within the road dataset is inevitable.There can also be large differences in data quality between countries, as well as between rural and urban regions (Neis and Zielstra, 2014).Nevertheless, it is an open source data portal that has seen a strong increase in contributions in recent years (Neis and Zielstra, 2014).It keeps being continuously updated and has been further improved for many years now, thus offering great advantages compared to timeconsuming and expensive in situ data collection or surveys, which are often rarely available (Workman and McPherson, 2021).Partial decreases and increases of the RAI within the admin-1 regions can result from different reasons.One factor could be misclassifications resulting from the Random Forest approach, but also due to the used OSM road data.It can happen, for example, that some roads were added incorrectly by OSM users the one year and then deleted in the following year.The RAI calculations depend not only on the input street network, but also on rural areas and population data used.In this study we included rural areas from 2015, derived from the Global Human Settlement Layer of the JRC and annual World-Pop population densities.These data sets represent an approximation to reality by statistical techniques and thus serve as a valuable data basis for the RAI calculations, independent of irregular official government data collections.
Research by the World Bank has shown that on average 57% of the rural population of the International Development Association (IDA) countries in 2003, including the four countries of our study area, have access to the transport network (Roberts et al., 2006).Our results show that the admin-1 regions we analyzed do not fall within this national average, with the exception of Ville de N'Djamena (Chad).These results strengthen the assumptions of the World Development Report that population in poor countries and regions especially have a longer travel time to reach basic services (World Bank Group, 2003).The report also states that children from the poorest 20% of the population from rural areas in Nigeria have to travel more than five times farther to the nearest elementary school than children from the richest 20% and even more than seven times further to the next health facility.In Chad, 80% of the poorest 20% of the population need to travel more than one hour to reach the nearest health facility (World Bank Group, 2003).

CONCLUSION AND OUTLOOK
This study analyzes temporal and spatial changes in the rural population in the Lake Chad region between 2013 and 2020 in terms of access to all-season roads.Our approach shows that the calculation of the RAI based on open source data in combination with remote sensing data yields very good results.The availability of time series data from Landsat 8, Sentinel-2 and OSM (in this case road networks) provides a cost-effective and solid basis for detecting asphalt roads using a Random Forest classification.By using the WorldPop population data set, an annual calculation of the RAI is possible.Furthermore, the approach can be fully automatized making it transferrable to other world regions, as long as the ground materials show the same spectral responses as our study area.Otherwise new training samples must be collected.Our approach can effectively contribute to the global calculation of SDG 9.1.1,an important variable for many political, social and natural scientists.The methodology of the RAI can also be extended to other research questions, such as the accessibility of educational institutions or health sites, which allows drawing conclusions about the need for political action.

Figure 3 .
Figure 3. RAI between 2013 and 2020 on admin-1 level (data source: Global Administrative Areas, 2012) Figure 5 also shows that the RAI in Ville de N'Djamena increased from 35.66% to 81.01% between 2013 and 2020.This admin-1 region has the highest RAI values of all processed admin-1 regions throughout the entire analysis period.This contrasts with the RAI values in Kanem

Table 2 .
Country statistics of calculated RAI results.

Table 3 .
Accuracy assessment values for street classifications for 2017 and 2019.