SpectralIndices.jl: Streamlining spectral indices access and computation for Earth system research

Remote sensing is an essential technology in environmental science to study Earth surface processes. In optical remote sensing, spectral indices (SI) are widely used to quantify the properties of specific surface characteristics. SI mathematically combine reflectance values measured at different wavelengths. To gain an overview and access to such indices, comprehensive catalogs have been published and implemented in various programming languages. However, there is no Julia-based tool available for efficiently managing and using these indices. Here we introduce SpectralIndices.jl , a Julia package designed to retrieve and compute SI. Built on the Awesome Spectral Indices (ASI) catalog, our package enables rapid computation of SI using native functions. The multiple dispatch capability of Julia optimizes data handling across various storage types, ensuring quick load times. While primarily based on the ASI collection, SpectralIndices.jl also accommodates custom-made indices, offering users the flexibility to explore and compare alternative indices. The software is open source and available on github.com/awesome-spectral-indices/SpectralIndices.jl


Introduction
Remote sensing has become essential in many branches of science and applications that deal with geospatial objects.In environmental science, remote sensing helps scientists to monitor, assess for instance, ecosystem status via vegetation health (Kureel et al., 2022), monitor aquatic systems, e.g., algae blooms (Köhler et al., 2024), or manage natural resources (Pinter Jr et al., 2003;Franklin, 2001), to name only a few.With the growing availability of optical Earth observation data, researchers have developed many spectral indices (SI) to analyze specific surface features and phenomena in areas such as vegetation (Zeng et al., 2022), water bodies (Ma et al., 2019), urban environments (Zha et al., 2003), and snow-covered regions (Salomonson and Appel, 2004).Additionally, these indices are used as a basis for environmental machine learning (ML) applications either as targets (Li et al., 2022;Luo et al., 2022;Martinuzzi et al., 2023) or as features (Pabon-Moreno et al., 2022;Montero et al., 2024).
The constantly increasing number of SI has made it essential to develop comprehensive catalogs.The Awesome Spectral Indices (ASI; Montero et al. 2023) suite provides one solution by offering a curated, machine-readable catalog of SI.Additionally, the ASI suite includes a Python library that users can use to query and compute these indices, along with an interface for the Google Earth Engine JavaScript application programming interface (Montero et al., 2022), catering to a diverse user base and a variety of use cases.
There is, however, no SI package developed for Julia, a programming language known for its high-performance computing capabilities (Bezanson et al., 2017).The Julia language has fostered a growing community for geographic and climate applications, as demonstrated by the development of numerous packages such as Oceananigans.jl(Ramadhan et al., 2020), GriddingMachine.jl(Wang et al., 2022), and GeoStats.jl(Hoffimann, 2018) among others.Given that Julia's popularity is rising in Earth and climate science, having no dedicated package for computing SI is a major gap.So far, Julia users can only use a workaround to access Python's extensive libraries and tools from Julia, including the specialized ASI Python package, through PyCall.jl(Johnsonand PyCall Development Team, 2013), but this comes at a cost of efficiency and flexibility.
Here we introduce SpectralIndices.jl, a Julia package that simplifies accessing and computing SI in remote sensing applications.SpectralIndices.jlprovides a user-friendly, efficient solution to immediately access the collection of SI contained in ASI and includes dedicated functions for their computation.We designed it with flexibility in input data types, ensuring ease of extension and maintenance.This paper is organized as follows: Section 2 introduces the SpectralIndices.jlpackage.It details how SI are collected in Section 2.1, describes the computational processes in Section 2.2, and outlines the supported data formats for input data in Section 2.3.Section 2.4 provides an overview of the code quality of the library.An applied use case is presented in Section 3. Points of discussion are explored in Section 4, and conclusions are drawn in Section 5.

Library Overview
SpectralIndices.jl offers direct access to and enables the computation of all SI available within the ASI suite.It is shows the dependencies structure.The core dependencies are imported by default when SpectralIndices.jl is imported.In contrast, the weak dependencies and the accompanying functions and methods are only loaded when the user explicitly calls them in the namespace.
The functions compute index and compute kernel form the core of the computational components.Additionally, helper functions such as get indices, get dataset(), and load dataset() provide further functionalities for the library.

Accessing and Importing Spectral Indices
SI are accessed and imported through the ASI collection.This process is conducted offline using the get indices() function with the argument true.If the argument is set to false ( default setting), the SI list is imported from the one saved in the data folder.When new indices are added to ASI, the local list is updated, and the package is subsequently released with a new version number.With the addition of new indices, the function list is updated through create indicesfun().This function reads formula strings from the JSON file and converts them into native Julia functions.
When importing the library into a Julia session, the complete list of SI is loaded in two distinct ways.The first method imports a dictionary of SpectralIndex objects.A single indices variable is added to the namespace, which contains the comprehensive list of indices.In this dictionary, the keys are the acronym of the index, and the values are the SpectralIndex structs that contain the SI information sourced from ASI.Similar dictionaries, named bands and constants, are also imported.These contain information about the bands and constants used in the SI computation.A schematic representation of this process is shown in Figure 1a.In the second method, all SpectralIndex structs included in the indices list are automatically imported into the namespace, offering a quicker and more convenient method to access information about a specific SI.Calling the index acronym in the Julia read-eval-print loop (REPL) will display its main features, such as the application domain, bands, parameters, formula, and article reference.

Computing Spectral Indices
SpectralIndices.jl provides a primary function, compute index(), to calculate SI.The first positional argument of this function specifies the index (or indices) the user wishes to compute.This argument can be supplied in two ways.The first method involves passing the index's acronym as a string, for example, compute index("NDVI").However, this method is limited as it does not accommodate indices not present in the indices dictionary.The alternative method allows users to specify a SpectralIndex struct directly, such as compute index(NDVI).As discussed in 2.1, all the indices' structs are imported in the namespace as the library is called.Moreover, to compute multiple indices simultaneously, users can pass them as an array to the function, for instance, compute index([NDVI, NDWI]).Unlike the first method, the second one supports the computation of custom indices.
The second component of the compute index() function is the reflectance data (bands) required to compute the SI.The bands can be provided to the function in two ways.The first method involves passing the data as the second positional argument: compute index(NDVI, params), where params is a data structure containing the necessary bands for the NDVI computation.This approach presumes adherence to specific guidelines, such as using the same nomenclature for the bands used in the struct information.For more specific details on data structures, please refer to the documentation.The alternative approach employs keyword arguments, where using the correct spectral band nomenclature as defined in the chosen SpectralIndex struct for computation is necessary.It is important to note that, despite the differences in methodologies for computing SI, all methods are equivalent in terms of computational speed.Many choices are given to accommodate a wide range of potential applications.
Finally, the library offers a compute kernel() function for calculating kernel-based indices, such as the kernel normalized vegetation index (kNDVI) (Camps-Valls et al., 2021).This function operates similarly to compute index().The first positional argument specifies the kernel required for computation, with options including linear, poly, and RBF.Parameters and data necessary for the computation can be supplied as a second positional argument or via keyword arguments.

Support for Data Formats
The library supports the computation of indices from bands stored in multiple data formats.The integration of external data types, such as Tables or DataFrames, is facilitated through external dependencies.Packages declared external are treated as conditional imports, meaning they are only loaded with SpectralIndices.jlwhen explicitly invoked by the user in the script.We use this feature of Julia to maintain a minimal number of core dependencies in the library.Consequently, this approach improves flexibility and ensures a fast load time for SpectralIndices.jl. Figure 1b provides a schematic depiction of this dependency structure.
The data types supported by the library, as of v0.2.9, include: • Native formats: Single values, arrays, matrices, and named tuples are comprehensively supported.These can be used to perform either index or kernel computations in the desired format.
• DataFrames: Leveraging DataFrames.jl (Bouchet-Valat and Kamiński, 2023), this extension computes indices in a two-dimensional, named format.It additionally supports the use and computation of data with GeoDataFrames.jl.

Code Quality
The code is open source and hosted on GitHub.It undergoes rigorous testing through the continuous integration tools provided by the platform, covering over 90% of the codebase.The test cases encompass various precision levels (Float64, Float32, Float16) and all the implemented data types for input bands (arrays, matrices, DataFrames, etc.).Furthermore, it involves computation tests for each spectral index, individually and in two random permutations.This thorough testing process not only ensures robustness but also aids in identifying potential errors in the upstream ASI collection.Finally, the tests incorporate Aqua.jl (Arakaki and Aqua Development Team, 2019), an automated quality assurance test suite explicitly designed for Julia packages.
The package documentation is available online and offers straightforward copy-paste examples for quick starts and more complex use cases to accommodate advanced applications.Additionally, it includes a detailed guide to the application programming interface (API) that serves as a comprehensive resource for users.The documentation can be accessed at https://awesome-spectral-indices.github.io/SpectralIndices.jl/.

Case Study
We demonstrate the use of SpectralIndices.jl by predicting the next time step for 16 different SI.The full list of the indices used is available in Table 1.The prediction task is as follows.
Let yt be the SI at time t.The goal is to predict the SI at the next time step t + 1, denoted as yt+1.The predictive model, f , is given by where θ represents all the model parameters, which are learned from historical data through a training process.The function f might be implemented using various ML techniques such as neural networks, decision trees, or support vector machines.We employ three prediction approaches.In the first approach, we train the model using only the necessary bands.After predicting these bands, we use SpectralIndices.jl to compute the SI.
In the second approach, we train the model on a single index at a time and sequentially predict all 16, deliberately avoiding parallel computing to simulate conditions where the process is parallelized across different pixels.In the third approach, we use all 16 bands to train the model.

Prediction Method
Echo state networks (ESNs; Jaeger 2001) are an efficient ML model for prediction of time series (Kim and King, 2020;Akiyama and Tanaka, 2022).They are based on expanding the input data into a higher-dimensional space, called the reservoir.The computational power of an ESN is determined by the size and spectral radius of the matrix that represents this reservoir (Jiang and Lai, 2019).This reservoir is a recurrent neural network with internal weights, which are set randomly and remain fixed.After expanding the input data into higherdimensional states, linear regression is used to train only the outer layer to make predictions.We chose ESNs for our demonstration due to their straightforward design and effective performance in time series prediction.We implemented the ESNs using ReservoirComputing.jl(Martinuzzi et al., 2022).Performance is quantified by the root mean square error (RMSE), , where n is the number of observations, yi is the actual value of the i-th observation, and ŷi is the predicted value of the i-th observation.
All experiments were run on a Dell XPS 9510 fitted with an Intel Core i7-11800H CPU and 16 GB of RAM.No GPU acceleration was used.

Data
In this study, we used spectral bands from the moderate resolution imaging spectroradiometer (MODIS) (Vermote, 2015) for the time period 2000-2017.We focused on time series data from a single pixel at latitude 51.075 and longitude 10.425, located in the Hainich Forest in central Germany.This forest predominantly consists of deciduous trees.We applied a Savitzky-Golay filter with a 7-day window and a third-order polynomial to smooth and denoise the data (Savitzky and Golay, 1964).

Results
Figure 2 presents the main findings of our case study.Computations using the bands only (approach 1) are performed faster than the alternatives.Specifically, predicting the bands in sequence (approach 2) is about 5.9 times slower than predicting the bands and then computing the indices (approach 1).Predicting all 16 bands (approach 3) is 47 times slower than approach 1.This slower speed stems from the increased number of input variables, which requires a larger reservoir matrix.Consequently, matrix multiplications become more costly, increasing the computation time.
Despite the gains in computational speed, there is no loss in prediction accuracy.For example, the original time series for the year 2017 and the corresponding single run prediction overlap, Figure 2b.Across a hundred runs, the RMSE indicates that no single approach consistently outperforms the others, with all approaches yielding comparable performance, Figure 2c.This study case illustrates how using SpectralIndices.jlsimplifies SI calculations and can accelerate ML tasks, such as time series prediction.

Discussion
This manuscript shows how SpectralIndices.jlcan assist practitioners in directly accessing SI computation.We illustrate the package's structure and modularity.In this section, we will discuss how the package and its features can contribute to a remote sensing scientist's computational toolkit.
The library is integrated within the growing Julia community and is dedicated to Earth system and climate software, specifically designed to support remote sensing developers.Our case study demonstrates how this package can reduce computational costs in ML applications by minimizing the number of variables required for training and prediction, Figure 2.More extensive ML investigations could capitalize on the wealth of information in the ASI collection by exploring various models and properties of SI (Reinhardt et al., 2024).
The software offers multiple data input options.However, this selection is not exhaustive, and expanding these options would increase the adaptability of the package.Plans for future expansions are already in progress, including extensions like Rasters.jl (Schouten and Rasters Development Team, 2019) and Tables.jl(Quinn et al., 2021), which will offer alternatives for the existing extensions.Future data types that are being considered for inclusion in SpectralIndices.jlare AxisArrays.jl(Bauman and AxisArrays Development Team, 2014), NamedDims.jl(White and NamedDims Development Team, 2019) and NamedArrays.jl(van Leeuwen and NamedArrays Development Team, 2013).A full list is available in the SpectralIndices.jlGithub repository in issue number 8. Additionally, computationally intensive applications benefit from parallel computation, a feature not yet fully integrated into SpectralIndices.jl.While it is possible to achieve parallel computation of SI using Distributed.jl(Bezanson and Distributed Development Team, 2011) and adapting internal functions with pmap, integrating this capability natively within the package would improve its functionality.

Conclusion
In this work, we introduced SpectralIndices.jl, a userfriendly package to efficiently compute SI from the ASI collection.Designed with modularity in mind, our software aims to simplify the process for researchers.Our case study demonstrates how this library can help streamline current research projects by facilitating easy access and computation of SI.

Figure 1 .
Figure 1.Code and dependencies structure.In panel a, we show the structure of offline and online building blocks of the library.Dashed lines indicate that an action is required from the user, while full lines indicate that functions are executed automatically.Starting from the top left, the function get indices() with the argument set as true retrieves the JSON file from the Awesome Spectral Indices (ASI) GitHub page.This file is saved in data.The function create indexfun() builds the SI functions from the formula strings and writes them into a indices funcs.jl in src.During runtime, three functions are executed, namely create constants(), create bands(), create indices().The results are loaded into namespaces and consist of three dictionaries containing the built structs SpectralIndices, Bands, and Constants.These structs contain the information retrieved from the ASI Github page.Not depicted is the additional loading into namespace of all the structs contained in indices.Panel bshows the dependencies structure.The core dependencies are imported by default when SpectralIndices.jl is imported.In contrast, the weak dependencies and the accompanying functions and methods are only loaded when the user explicitly calls them in the namespace.

Figure 2 .
Figure 2. Comparison of three approaches for predicting the next step of 16 spectral vegetation indices (VI).Approach 1 utilizes spectral bands for training and prediction, followed by the calculation of VI using SpectralIndices.jl.Approach 2 involves predicting each VI sequentially.In Approach 3, all VIs are used for both training and prediction.Panel a presents the computation times for all three approaches, with mean times derived from an average of 100 runs.Panel b displays the next-step predictions for all 16 VIs from a single run for a pixel located in Hainich, Germany, in the year 2017.The predicted and original signals overlap.Panel c quantifies the performance of the different approaches using the root mean square error (RMSE), with the mean RMSE calculated over 100 runs.The plot is done using Makie.jl(Danisch and Krumbiegel, 2021),