Facilitating Advanced Sentinel-2 Analysis Through a Simplified Computation of Nadir BRDF Adjusted Reflectance

The Sentinel-2 (S2) mission from the European Space Agency's Copernicus program provides essential data for Earth surface analysis. Its Level-2A products deliver high-to-medium resolution (10-60 m) surface reflectance (SR) data through the MultiSpectral Instrument (MSI). To enhance the accuracy and comparability of SR data, adjustments simulating a nadir viewing perspective are essential. These corrections address the anisotropic nature of SR and the variability in sun and observation angles, ensuring consistent image comparisons over time and under different conditions. The $c$-factor method, a simple yet effective algorithm, adjusts observed S2 SR by using the MODIS BRDF model to achieve Nadir BRDF Adjusted Reflectance (NBAR). Despite the straightforward application of the $c$-factor to individual images, a cohesive Python framework for its application across multiple S2 images and Earth System Data Cubes (ESDCs) from cloud-stored data has been lacking. Here we introduce sen2nbar, a Python package crafted to convert S2 SR data to NBAR, supporting both individual images and ESDCs derived from cloud-stored data. This package simplifies the conversion of S2 SR data to NBAR via a single function, organized into modules for efficient process management. By facilitating NBAR conversion for both SAFE files and ESDCs from SpatioTemporal Asset Catalogs (STAC), sen2nbar is developed as a flexible tool that can handle diverse data format requirements. We anticipate that sen2nbar will considerably contribute to the standardization and harmonization of S2 data, offering a robust solution for a diverse range of users across various applications. sen2nbar is an open-source tool available at https://github.com/ESDS-Leipzig/sen2nbar.


Introduction
The Sentinel-2 (S2) mission from the Copernicus program of the European Space Agency (ESA), comprising the Sentinel-2A (S2A) and Sentinel-2B (S2B) satellites, is equipped with the MultiSpectral Instrument (MSI) sensor (Drusch et al., 2012).This sensor is designed to capture data across various spectral bands with high-to-medium spatial resolutions (Table 1).The Level-2A product from S2 provides surface reflectance (SR) data, which are indispensable for detailed Earth surface analysis.This product has been employed in diverse applications, examples include the estimation of carbon fluxes (Pabon-Moreno et al., 2022), the investigation of vegetation dynamics using spectral indices (Montero et al., 2023a), and Land Use and Land Cover (LULC) products generation via Artificial Intelligence (AI) models (Brown et al., 2022).The utility of S2 SR data is significantly enhanced through the incorporation into Earth System Data Cubes (ESDCs, Mahecha et al., 2020;Montero et al., 2023b), which offer an organized spatiotemporal framework, allowing for simplified multitemporal analyses.
While S2 SR data is widely used, its viewing angle (±10.3°) and field of view (20.6°) amplify Bidirectional Reflectance Distribution Function (BRDF) effects due to SR anisotropy (Roy et al., 2017b).To minimize BRDF effects, adjustments simulating a nadir viewing perspective are needed (Roy et al., 2016).These corrections address the directional effects arising from the anisotropic nature of SR and the variability of sunlight and satellite viewing angles.Such adjustments are crucial for the consistent comparison of images taken at different times and under various sensor acquisition conditions.This is particularly important for the processing and analysis of analysis-ready ESDCs, which are increasingly utilized due to their organized spatiotemporal structure and the simplicity of generating them from cloud-stored data (Montero et al., 2023b).
The spectral parameters derived from the MODIS BRDF model facilitate the computation of directional reflectance across any specified sensor viewing and solar angles (Roy et al., 2008).Using this framework, Roy et al. (2008Roy et al. ( , 2016) ) introduced the cfactor method, a straightforward approach for adjusting Landsat SR data by applying the MODIS spectral BRDF model parameters.This adjustment yields Nadir BRDF Adjusted Reflectance (NBAR) by multiplying the observed Landsat SR with the ratio of reflectances predicted by the MODIS BRDF model for both the observed Landsat SR and a standard nadir view under fixed solar zenith conditions.Roy et al. (2017b) and Roy et al. (2017a) extended this methodology for multiple S2 spectral bands.Despite the simplicity of the c-factor method for individual S2 images, a unified Python framework for applying this conversion uniformly across multiple images, especially for ESDCs derived from SpatioTemporal Asset Catalogs (STAC), is missing so far.document is organized into the following sections: Section 2 presents the sen2nbar framework, elaborating on the modules that simplify the processing steps; Section 3 illustrates the application of sen2nbar, providing practical examples; Section 4 explores the limitations of sen2nbar as well as its potential, with a particular emphasis on AI; and Section 5 summarizes our conclusions.

The sen2nbar Python package
The sen2nbar package, developed in Python, facilitates the conversion of S2 L2A Surface Reflectance values into NBAR values through a single function.To achieve this higher level, the package is structured into several modules, ensuring a methodical conversion process leveraging multidimensional arrays via xarray (Hoyer and Hamman, 2017) and numpy (Harris et al., 2020):

The axioms module
This module stores the predefined MODIS BRDF spectral model parameters, namely f iso (λ), f vol (λ), and fgeo(λ) (as shown in Table 1), in addition to preserving the values corresponding to the spatial resolution of each spectral band.In accordance to the methodology proposed by Roy et al. (2017a,b), our implementation encompasses all spectral bands except for the Aerosols (B01) and the narrow NIR (B8A) bands.Consequently, the NBAR values are not computed for B01 and B8A bands.Also note that for the S2 Red Edge bands, the BRDF spectral model parameters were determined through a process of linear interpolation between the red and NIR MODIS BRDF spectral model parameters (Roy et al., 2017a).

The metadata module
This module is tasked with the extraction of two critical sets of data from the scene's metadata files (MTD MSIL2A.xml and MTD TL.xml, Figure 1a), essential for the computation and harmonization of NBAR values: firstly, it extracts the sun and sensor viewing angles; secondly, it determines the processing baseline of the scene.The acquisition of this information is conducted directly from the metadata file when dealing with SAFE files 1 .Alternatively, when operating with ESDCs constructed via stackstac or cubo (Montero et al., 2024), the module retrieves the requisite data from the corresponding metadata STAC asset within a specific item in a STAC catalog using the requests library.This dual-path approach ensures flexibility and adaptability in processing various data sources for NBAR computation and harmonization.

Processing Baseline:
The Processing Baseline (PB) of the scene is extracted and converted into a floating-point numerical format.This PB version serves as a criterion for initiating a harmonization procedure for scenes possessing a PB version of 4.00 or greater.

The kernels module
A kernel-based BRDF model integrates various scattering mechanisms as a linear combination of distinct scattering modes, termed kernels (Lucht et al., 2000).The Ross-Li BRDF model (Section 2.4), also known as the Ross-Thick/Li-Sparse Reciprocal BRDF model, which is the standard for MODIS BRDF (Lucht et al., 2000), is composed of three components: the isotropic scattering parameter, the radiative transfer-type volumetric scattering kernel (Volumetric Kernel K vol ), and geometricoptical surface scattering kernel (Geometric Kernel Kgeo).The objective of this module is to calculate the volumetric and geometric kernels.The computation of these kernels adheres to the mathematical formulations presented by Lucht et al. (2000).

Volumetric Kernel:
The Volumetric Kernel K vol in the Ross-Li BRDF model is represented by the RossThick kernel (Roujean et al., 1992), which is based on the radiative transfer theory presented by Ross (1981): Where cos ξ = cos θ cos ϑ + sin θ sin ϑ cos ϕ.

Geometric Kernel:
The Geometric Kernel Kgeo in the Ross-Li BRDF model is represented by the LiSparse kernel (Wanner et al., 1995), which is based on the geometric-optical mutual shadowing BRDF model by Li and Strahler (1992): where Here, the cos t term is constrained to [-1,1], h/b = 2, and b/r = 1 (Lucht et al., 2000).

The brdf module
This module is dedicated to the computation of the Ross-Li BRDF model, leveraging the scattering kernels sourced from the kernels module and the BRDF spectral parameters obtained from the axioms module, in accordance with the mathematical framework outlined by Lucht et al. (2000).Prior to computation, the BRDF spectral model parameters are transformed into xarray.DataArray objects such that f iso , f vol , fgeo ∈ R |B| .This enhances the efficiency and scalability of the BRDF model computation.Consequently, this computation of the Ross-Li BRDF model outputs values as a multidimensional array BRDF(θ, ϑ, ϕ) ∈ R |B|×H×W :

The c factor module
This module is tasked with calculating the c-factor, adhering to the methodology delineated by Roy et al. (2008).The c-factor is instrumental in modifying the reflectance values to align with any given viewing or solar geometry, as outlined by Roy et al. (2016).To standardize the reflectance measurements to a nadir viewing zenith angle, the value of the viewing zenith angle, ϑ, is predetermined to be 0. Note that c(θ, ϑ, ϕ) ∈ R |B|×H×W : Furthermore, this module integrates information and functionalities from preceding modules to directly derive the c-factor from a metadata file, establishing a comprehensive function specifically tailored for SAFE files.Additionally, recognizing the accessibility of metadata files through STAC items, a separate function has been developed to compute the c-factor directly from STAC items.This latter function is notably equipped with the capability to reproject the c-factor to a designated CRS when required.This reprojection feature is essential for ensuring the alignment and consistency of ESDCs created from scenes that originate in varying CRS, thereby facilitating their integration and analysis within a unified spatial framework.

The nbar module
This module is designed to compute NBAR values, tailored to the format of the input data, whether they are SAFE files or ESDCs constructed via stackstac or cubo: 2.6.1 SAFE files: For SAFE files (Figure 1a), the spatial resolution is band-specific, denoted as λ ∈ B, leading to the necessity of computing NBAR values for each λ independently.Let ρ λ ∈ R H λ ×W λ represent the SR of band λ, with H λ and W λ indicating its spatial dimensions.Initially, metadata files are used to extract the PB ∈ R+ and to compute the c-factor c(θ, ϑ, ϕ) ∈ R |B|×H×W .The PB is then applied to harmonize ρ λ if required: Here, ρ * λ signifies the harmonized SR values.Subsequently, the NBAR values are computed.It's important to note that the c-factor is derived using the original spatial resolution of the angle information (5 km), which makes the computation faster compared to performing angle interpolation beforehand.Therefore, c(θ, ϑ λ , ϕ λ ) undergoes bilinear interpolation, denoted by the function I such that I : R H×W → R H λ ×W λ , to align with the original resolution of band λ.The NBAR values are then generated as follows: Resulting NBAR images are subsequently exported individually into the same SAFE file directory, available in either COG or GeoTIFF formats.

Showcase
In a case study designed to showcase the capabilities of the sen2nbar package in handling multidimensional data, we util- Utilizing the aforementioned ESDC, NBAR values were computed, followed by the calculation of the difference: This metric shows the variations in reflectance values when transitioning from SR to NBAR data, particularly within the spatiotemporal framework of the ESDC. Figure 3 illustrates the ∆ρ λ for the ESDCs.Spatially, while the reflectance scales according to the c-factor and shows minimal variance owing to the confined area of study, the temporal variations are more pronounced, underscoring the significance of NBAR for accurate multitemporal analysis.This effect is especially marked in the Red Edge-NIR region, where discrepancies between values are more substantial compared to other bands.However, it's important to note that the maximum and minimum ∆ρ values exhibit similar ranges across all bands, highlighting the impact of NBAR adjustments.
Reflectance changes, as highlighted previously, can greatly influence the derivation of various products.To illustrate this effect, we computed four widely used vegetation indices for both the NBART and ρ * T : the Normalized Difference Vegetation Index (NDVI, Rouse et al., 1974), the Near-Infrared Reflectance of Vegetation (NIRv, Badgley et al., 2017), the Kernel NDVI (kNDVI, Camps-Valls et al., 2021), and the Inverted Red Edge Chlorophyll Index (IRECI, Frampton et al., 2013).Additionally, the difference between the indices derived from NBAR values and those from SR values was calculated as: Here, ψ represents the index in question.Figure 4 displays the ∆ψ for the ESDCs.It is noted that the impact on NDVI is minimal, but it considerably increases for the other indices, particularly for kNDVI and IRECI, where the largest absolute changes observed reach values of 0.051 and 0.087, respectively.This variance underlines that the influence of reflectance adjustments on derived indices is index-dependent and can be substantial enough to potentially skew analysis reliant on these derived features.

Discussion
Here we present various factors that either constrain or enhance the capabilities of sen2nbar, focusing on data providers and scalability.As we discuss below, this approach enhances the possibilities for big data computations, workflow generation towards analysis-ready data cubes, and therefore eases the application of novel artificial intelligence (AI) methods.values.Future versions of sen2nbar may overcome these limitations by enabling the interconnection of STAC assets (leveraging data from one STAC provider while sourcing metadata from another).This approach, however, depends on the availability of identical scenes across different collections.

Parallel Processing and Lazy Evaluation
sen2nbar employs multidimensional arrays through xarray and numpy, enabling the execution of various array operations.
In the context of ESDCs, the outcome of NBAR processing is another multidimensional array mirroring the dimensions of the input array.This architecture facilitates the acceleration of computations by exploiting lazy evaluations and parallel processing when necessary.Specifically, when an ESDC is generated via stackstac or cubo, the cube initially remains unevaluated, and its actual creation occurs only upon data retrieval.This setup allows the NBAR processing to be integrated into the lazy evaluation chain, as xarray accommodates dask arrays (Rocklin, 2015), ensuring that the NBAR computation is conducted concurrently with cube creation.Moreover, if an ESDC is pre-existing, either stored in memory or on disk, it can be divided into chunks for processing.sen2nbar is then capable of executing the NBAR transformation in a lazy manner on these chunks.This approach is particularly effective with inherently chunked data formats, such as zarr, which can be processed lazily and re-written to disk in chunks, enhancing efficiency.

Refining Multidimensional Dataset Production
sen2nbar can enhance the accuracy of S2 data and derived features, supporting a wide range of applications, from individual snapshot analyses to comprehensive multitemporal studies.This extends to datasets generated from S2 data.This improvement will be largely attributed to the streamlined generation of ESDCs via STAC coupled with the data accuracy refine-ment process offered by sen2nbar.This conversion process ensures interoperability across the temporal dimension for all kind of applications.As a result, sen2nbar streamlines the processing of multidimensional datasets like FluxnetEO (Walther et al., 2022), which originates from MODIS NBAR data designed for vegetation monitoring but is adaptable to S2 data, or BigEarthNet (Sumbul et al., 2019), derived from S2 patches aimed at land cover classification benchmarks.Therefore, the sen2nbar package emerges as a crucial tool in remote sensing, improving the accuracy and utility of data for a broad spectrum of applications.

Artificial Intelligence
The ultimate impact of sen2nbar is its potential to greatly improve AI models in remote sensing.With the introduction of Transformer architectures (Vaswani et al., 2017), AI has seen the development of new models that surpass traditional architectures across a variety of tasks, including Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) networks.In remote sensing, the adoption of Transformer models has notably advanced, especially with pretrained models such as SITS-Former (Yuan et al., 2022).These foundation models, capable of learning from unlabeled datasets, are instrumental for various specific applications.Notable examples include Prithvi-100M (Jakubik et al., 2023) and SpectralGPT (Hong et al., 2023), which are trained on Harmonized Landsat Sentinel (HLS, Claverie et al., 2018) data and BigEarthNet, respectively.As these models form the basis for diverse applications, the role of sen2nbar in improving the quality of input datasets becomes invaluable.By ensuring that these models have access to clean and accurate data, particularly in a multitemporal context, sen2nbar will greatly contribute to the improvement of AI models' performance in remote sensing applications.

Conclusions
In this paper, we introduced sen2nbar, an open-source Python package crafted to facilitate the computation of Nadir BRDF Adjusted Reflectance (NBAR) values for Sentinel-2 L2A data.By enabling the computation of NBAR values through a single function, sen2nbar greatly simplifies the process, demonstrating its adaptability and compatibility with SAFE files as well as Earth System Data Cubes (ESDCs) derived from STAC through stackstac or cubo.We foresee sen2nbar playing an instrumental role in enhancing the accuracy of remote sensing data analyses within Earth system research.Its utility is especially pronounced in spatio-temporal analysis and in strengthening advanced AI-driven tasks via refined Sentinel-2 datasets.

Figure 1 .
Figure 1.Flowcharts depicting the process of computing NBAR for SAFE files (a) and ESDCs (b).In (a), NBAR values are calculated for each spectral band and subsequently stored either as GeoTIFF or Cloud Optimized GeoTIFF (COG) within a newly created folder titled "NBAR" inside the original SAFE file's directory.For ESDCs, depicted in (b), the NBAR computation results in the generation of a new ESDC, represented as an xarray object.The dashed arrows linking the STAC object in (b) signify that this procedure is automated, eliminating the need for users to manually extract items from the STAC.

Figure 2 .
Figure 2. Diagram illustrating the c-factor computation block.The arrays depicted in this block serve as examples derived from a single metadata file, with the understanding that values could vary across different files.
Data Cubes: For ESDCs constructed from STAC (Figure1b), the spatial resolution is predefined, enabling the generation of NBAR values for the entire ESDC via operations on multidimensional arrays.We denoted ρT ∈ R |T |×|B|×Hr ×Wr as the SR of the ESDC, where T represents the temporal dimension items, and Hr and Wr are its spatial dimensions at resolution r, alongside the bounding box (BBox) of the ESDC.Initially, the module accesses the STAC, retrieving metadata files and PB values for each item τ ∈ T .For each τ , the c-factor cτ (θτ , ϑτ , ϕτ ) is calculated and subsequently concatenated along the temporal dimension such that cT = [cτ (θτ , ϑτ , ϕτ )] τ ∈T , resulting in cT ∈ R |T |×|B|×H×W .The PB values for each τ are also concatenated as PBT ∈ R |T | .This array facilitates the harmonization of ρT : ρ * T = ρτ − 1000, if PBτ ≥ 4, ρτ , otherwise τ ∈T (13) Here, ρ * T represents the harmonized SR values of the ESDC.Following this, NBAR values are calculated.Given that the cfactor is derived from the original spatial resolution of angle information, it undergoes bilinear interpolation through function I such that I : R |T |×|B|×H×W → R |T |×|B|×Hr ×Wr , aligning with the predetermined resolution and BBox of ρ * T .This process enables NBAR computation for ESDCs with uniform spatial dimensions (Hr = Wr, as in cubes generated via cubo) and for those with non-uniform dimensions (Hr ̸ = Wr, as in cubes generated via stackstac).Note that I only interpolates the spatial dimensions H and W .The NBAR values are then derived as follows: NBART = I(cT ) × ρ * T (14)

Figure 3 .
Figure 3. ESDCs illustrating the difference between NBAR and SR values.The initial ESDC presents an RGB composite utilizing visible bands to create a true color reference image.The maximum (blue) and minimum (red) ∆ρ values for each band are indicated in the lower-right corner of the corresponding ESDC.

Figure 4 .
Figure 4. ESDCs illustrating the difference between indices calculated using NBAR and SR values.The first row presents the four calculated indices using the SR values.The second row presents the ∆ψ values for each ESDC.The maximum (green) and minimum (brown) ∆ψ values for each index are indicated in the lower-right corner of the corresponding ESDC.

Table 1 .
sen2nbar, an open-source Python package designed to facilitate converting S2 SR data to NBAR using the c-factor method.sen2nbar is engineered to support both automatic NBAR conversions through a single function, accommodating SAFE files and ESDCs derived from STAC.The BRDF spectral parameters for the Sentinel-2 bands.
The extraction process for solar and sensor viewing angles in degrees involves the aggregation of detector-specific data across each spectral band into a unified 4-dimensional array A ∈ R |Λ|×|Θ|×H×W .This array is structured with dimensions |Λ| = |B| + 1, where B are the bands undergoing conversion (with an additional index allocated for solar information) and Λ is the set of bands added to the solar component; |Θ| = 2 (the number of extracted angles, i.e. 1 https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/data-formats 2.2.1 Solar and sensor viewing angles: Θ = {zenith, azimuth}); and H = W = 23, reflecting the spatial resolution of this information (5 km).The assignment of spatial coordinate values within this grid uses the Upper Left (UL) X (ULX) and Y (ULY) coordinates, derived from the tile's geocoding information.
Impact of Metadata Availability Conversely, sen2nbar does not support the S2 L2A collection from the Element84 STAC due to the absence of necessary metadata files.Similarly, while tools like cubo facilitate the creation of ESDCs from GEE, the lack of metadata files in this context precludes the calculation of NBAR