XDGGS: A community-developed Xarray package to support planetary DGGS data cube computations

Traditional map projections introduce distortions, especially for global data. Discrete Global Grid Systems (DGGS) offer an alternative by dividing the Earth into equal-area grid cells at different resolutions. This paper describes xdggs, a new Xarray extension that simplifies working with DGGS. Xdggs provides a unified API for various DGGS libraries and integrates seamlessly with the Pangeo ecosystem through extending the widely used Xarray library to use the DGGS-specific cell identifiers as an index. This development makes DGGS more accessible and will lead to facilitating data analysis on a planetary scale. Xdggs aims to provide a user-friendly API that hides the implementation complexities of different DGGS libraries. And because it integrates seamlessly with Xarray, a popular tool for geospatial data analysis, xdggs promotes FAIR data practices by simplifying data access and interoperability and can become a valuable tool for geospatial scientists and application developers working with global datasets.


Introduction
Traditional maps use projections to represent geospatial data in a 2-dimensional plane.This is both convenient and computationally efficient.However, it also introduces distortions in terms of area, distances, and angles, especially for global data sets (de Sousa, Luis M. and Poggio, Laura and Kempen, Bas, 2019).Several global grid system approaches like Equi7Grid or UTM aim to reduce the distortions by dividing the surface of the earth into many zones and using an optimized projection for each zone to minimize distortions.However, it introduces analysis discontinuities at the zone boundaries and makes it difficult to combine data sets of varying overlapping extents (Bauer-Marschallinger et al., 2014, Bauer-Marschallinger andFalkner, 2023).Discrete Global Grid Systems (DGGS) provide a new approach by introducing a hierarchy of global grids that tesselate the Earth's surface evenly into grid cells of similar size and shape around the globe at different spatial resolutions, and providing a unique indexing system (Sahr et al., 2004).DGGS are now also defined in the joint ISO and OGC DGGS Abstract Specification Topic 21 (ISO 19170-1:2021).DGGS can serve as spatial reference systems facilitating data cube construction, enabling integration and aggregation of multi-resolution data sources.Various cell geometries such as hexagons, quadrangles, and triangles and different tessellation strategies cater to different needs -equal area, optimal neighborhoods, congruent parent-child relationships, ease of use, or vector field repres-entation in modeling flows.Purss et al. (2019) have explained the idea of combining DGGS and data cubes and underlined the compatibility of these two concepts.Thus, DGGS are a promising way to harmonize, store, and analyze spatial data on a planetary scale (Purss et al., 2019).DGGSs are traditionally often used in tabular format, where each cell is represented in a row and the cell id is a column serving as spatial identifier.In addition, usually datasets also have other parameters which can be considered as dimensions, such as time, altitude, ensemble member, etc.For these, Xarray (Hoyer and Hamman, 2017), one of the core packages in the Pangeo ecosystem, is an ideal container for multidimensional DGGS data.At the joint OS-Geo and Pangeo codesprint at the ESA BiDS'23 conference (6.-9.November, 2023, Vienna), members from both communities came together and embarked on implementing support for DGGS in the Xarray Python package, which is at the core of many geospatial big data processing workflows.The result of the codesprint is a prototype Xarray extension named xdggs (https://github.com/xarray-contrib/xdggs),which we describe in this article.

DGGS and open-source software
There are several open-source libraries that make it possible to work with DGGS: For these open-source libraries Python bindings are available.However, they come with their very own not easy-to-use APIs, different assumptions, and functionalities (Kmoch et al., 2022a).This makes it difficult for users to explore the wider possibilities that DGGS can offer and compare different DGGS for the same workflow.
The aim of xdggs is to provide a unified, high-level, and userfriendly API that simplifies working with various DGGS types and their respective backend libraries, seamlessly integrating with Xarray and the Pangeo open-source geospatial computing ecosystem.Executable notebooks demonstrating the use of the xdggs package are also being developed to showcase its capabilities.
Except for the Hierarchical Equal Area isoLatitude Pixelization (HEALPix), the above-mentioned DGGS open-source implementations have been evaluated with a focus on area and shape distortions (Kmoch et al., 2022b).The equal-area aspect is becoming an increasingly important criterion to consider when designing large-scale or even global geospatial data cubes and digital twins.Destination Earth (DestinE) is a European flagship initiative aiming to create a highly accurate digital Earth twin to model and simulate natural and human activities for climate adaptation and disaster mitigation (Hoffmann et al., 2023).A key component is the Climate Change Adaptation Digital Twin, which integrates data from various models and multi-decadal climate projections using the HEALPix grid system (Górski et al., 2005).Originally designed for cosmological applications, the HEALPix is an equal-area DGGS that offers versatile properties for Earth science modeling.In order to complement the statistics with an overview of HEALPix, we also reproduce Kmoch et al. (2022) original study for HEALPix in this article.

XDGGS design and roadmap
The xdggs community contributors set out with a set of guidelines and common DGGS features that xdggs should provide or facilitate, to make DGGS semantics and operations accessible to use via the user-friendly Xarray API of working with labelled arrays: • import/export between DGGS and traditional 2D geospatial formats (e.g., rasters, latitude/longitude or UTM rectilinear grids, triangular irregular networks (TINs), vector gemoetries, or unstructured meshes and pointclouds) • convert between different cell id representations of same type DGGSs (e.g., uint64 vs. string) • select data on a DGGS by cell ids or by geometries (spatial indexing) • aggregate/disaggregate data between different resolution levels of a DGGS, down and upsampling, respectively.• operations between similar DGGS (with auto-alignment) re-organize cell ids (e.g., spatial shuffling/partitioning) • and plotting.
To ensure seamless integration within Xarray's framework, xdggs exploits the Xarray extension mechanisms, like accessors, to link DGGS-specific functionalities.Efficiency needs to be considered by utilizing existing Python libraries with vectorized DGGS functions to work with large datasets.
While adhering to established DGGS standards, xdggs acknowledges practical considerations.Deviations from the standards might be necessary to ensure smooth integration with popular DGGS libraries.Scalability is a priority for both highresolution DGGS (billions of cells) and diverse applications across GIS and Earth-System communities.Vertical scaling can be achieved through optimized DGGS implementations in backend libraries, while horizontal scaling will leverage Xarray's interoperability with Dask.Some operations may require focusing on vertical optimization first before exploring horizontal solutions.
The design explicitly also considered 'non-goals', implementation details that should not be included into xdggs directly, but be complemented through other libraries.xdggs relies on Xarray's extensive capabilities for data manipulation, especially regarding temporal coordinates as an orthogonal data dimension.Second, xdggs integrates with existing Python libraries that implement various DGGS.This eliminates the need for xdggs to replicate core functionalities.Similarly, xdggs concentrates on providing only essential DGGS operations and common resampling methods.More specialized functionalities, especially around re-gridding, should be implemented by combining core operations with complementary tools and libraries.

XDGGS implementation details and examples
The goal of the joint code sprint was to implement these into an extension for the Xarray package.The library and the examples are publicly available at: https://github.com/xarraycontrib/xdggsXdggs represents a DGGS as an Xarray Dataset or DataArray with a 1-dimensional coordinate using the DGGS cell ids as labels.This coordinate is indexed using a custom, Xarraycompatible DGGSIndex, which needs to be instantiated with customizable DGGS-specific parameters like grid name, resolution, and additional attributes but does currently not support cell ids of mixed-resolutions in the same coordinate axis.xdggs also only supports one DGGS for a Dataset or DataArray but can index with multiple coordinates, like time, together with one DGGSIndex.
xdggs.DGGSIndex is the base class for all Xarray DGGS-aware indexes.It inherits from xarray.indexes.Index and uses an xarray.indexes.PandasIndex built from cell ids so that selection and alignment by cell id is possible.For each DGGS-type that xdggs shall support, a subclass of DGGSIndex is being implemented.Currently, the following DGGSs are usable but still considered experimental: • HealpixIndex, for Healpix, uses the Healpy library • H3Index for H3, uses the h3ronpy library • ISEAIndex, uses the dggrid4py library A DGGSIndex can be set directly from a cell ids coordinate using the Xarray API: {"grid_name": "h3", "resolution": 3})} ) # auto-detect grid system and parameters ds.set_xindex("cell", xdggs.DGGSIndex) # set the grid system and parameters manually ds.set_xindex("cell", xdggs.H3Index, resolution=3) The DGGSIndex can be set automatically when converting a gridded or vector dataset to a DGGS dataset.DGGS data creation involves multiple methods, including re-gridding from a latitude/longitude rectilinear grid, re-gridding from an unstructured grid, re-gridding and reprojecting from a raster in a local projection, aggregating from vector point data, and filling from polygon data.Conversely, DGGS data can be converted into various forms, such as re-gridding onto a latitude/longitude rectilinear grid, rasterizing through resampling or projection, converting to vector point data representing cell centroids, and converting to vector polygon data delineating cell boundaries.This development represents a significant step forward.With xdggs, DGGS become more accessible and actionable for data users.Like traditional cartographic projections, a user does not need to be an expert on the peculiarities of various grids and libraries to work with DGGS and can continue working in the well-known Xarray workflow.One of the aims of xdggs is making DGGS data access and conversion user-friendly, while dealing with the coordinates, tesselations, and indexing under the hood.Figure 1 illustrates the conversion of an orginally rectangularly latitude-longitude gridded dataset into hexagonal DGGS cells draped over the globe, using the Xarray tutorial weather data.DGGS-indexed data can be stored in an appropriate format like Zarr or (Geo)Parquet, with corresponding metadata to identify

HEALPix DGGS equal-area assessment
The HEALPix grid was originally developed to process, analyze, and create discretized spherical maps from very large volumes of astronomical data from cosmic microwave background experiments (Górski et al., 2005).It is increasingly considered in the Earth Sciences, e.g. by the World Meteorological Organization (WMO) as an indexing scheme for GRIB2 data files, or for the ICON climate model.This could signify a critical advancement for data representation in the climate sciences, where Xarray is also widely used.
We calculated the area and shape distortion for the HEALPix grid and updated the code and data supplements for the opensource DGGS cell geometry comparisons.Figure 2 shows the strong equal-area properties of HEALPIx.Xdggs demonstrates that we can apply DGGS in place of the traditional rectangular coordinate systems typically used for geospatial raster and earth-system model data, side-by-side with other orthogonal dimensions such as time, vertical level, etc.This suggests that DGGS can potentially solve some of the core interoperability challenges that plague Earth System science-specifically, adopting standard DGGS offers a clear path forward for fusing datasets with different resolutions, such as Earth-observing satellites and global climate models (Liang et al., 2024).Rather than distributing EO datasets in the ubiquitous Military Grid Reference System with its discontinuous UTM-zone projections, data providers could potentially harmonize their raw data to a suitable DGGS instead (Salgues et al., 2023).

Plotting XDGGS data
Existing visualization and spatial analysis tools are primarily designed for rectangular pixels.Adapting these tools to work efficiently with the unique properties of DGGS (like variable resolutions and non-rectangular cells) requires significant development effort.This includes modifying the tools to interpret the hierarchical structure of DGGS and to perform spatial operations accurately.
Xdggs does currently not provide integrated visualization facilities.Several approaches, which are not mutually exclusive, can be used to plot Discrete Global Grid System (DGGS) data directly through the PyViz ecosystem (Signell and Pothina, 2019): • Convert cell data into gridded or raster data, selecting the grid or raster resolution based on the resolution of the rendered figure, and then utilize existing Python plotting libraries such as matplotlib, cartopy, and holoviews via the xarray plotting API.For this approach, using datashader to set both the resolution and raster extent dynamically may be beneficial.
• Convert cell data into vector data and plot it using advanced geospatial Python libraries like geoviews, xvec, or geopandas.This approach may involve dynamically downgrading the DGGS resolution and aggregating the data before converting it to vector form to enable interactive plotting of large DGGS datasets.Figure 6 shows an orthographic plot of HEALPix at resolution 5 (ca 12.000 cells) with the GeoViews package.• Use libraries that support plotting DGGS data directly.For example, HEALPix provides direct plotting capabilities

Further directions
Nevertheless, continuous efforts are necessary to broaden the accessibility of DGGS for scientific and operational applications, especially in handling gridded data such as global climate and ocean modeling, satellite imagery, raster data, and maps.This would require, for example, an agreement ideally with entities such as the OGC for DGGS reference systems' registry (similar to the epsg/crs/proj database) in order to describe and identify unique DGGS types and their configurations.These are important metadata about the specifics of the used DGGS in form, e.g., an identifier, label, or link to a detailed definition.This would likely need to include the main type e.g.H3, S2, HEALPix, rHEALPix, ISEA3H, but also required parameters like for HEALPix (indexing: nested or ring), for rHEALPix (ellipsoid, orientation/rotation), icosahedron-based DGGS types (topology, refinement ratio, orientation, potentially mixed apertures, etc.).It would be good to have synergies here and not reinvent the wheel.
1.The implementation of DGGS on Xarray should be improved further with a more user-friendly API for gridding existing data into DGGS grids, similar to or as part of the GDAL suite.
2. DGGS indexed reference datasets could be validated and also used to highlight case studies and instructional material can be used in academic courses and workshops, focusing on the practical applications of data fusion, quick addressing of equal-area cell grids, artificial intelligence, infrastructure i.e. navigation systems of scientific polar ship navigation, AI, socio-economic and environmental studies.Especially the emerging property of selecting cellranges from different data sources to join and integrate only based on cell ids could make partial data access and sharing more dynamic and easy.
3. Continue to improve on the interoperability with STAC catalogs for satellite, modeled, and in-situ data -case studies and reference materials should include a workshop for practitioners to understand the integration and use of STAC and DGGS, emphasizing the importance of opensource tools and the open source software community.

4.
Training materials and Pangeo community sessions should be conducted to demonstrate the use of DGGS in Xarray, aimed at enhancing the skillset of practitioners and researchers in geospatial data handling, spatial data analysis, and professional and academic institutions.
To address these challenges, the geospatial community needs to focus on developing open-source libraries that support DGGS, enhancing GIS software to handle non-rectangular grids, and creating efficient algorithms for data conversion and spatial analysis.There is also a need for extensive testing and validation to ensure that DGGS implementations can handle real-world datasets without significant loss of information, functionality, or excessive computing resources.
In conclusion, while DGGS offers a promising approach to unifying and managing earth observation data on a global scale, it also introduces a new set of challenges that require considerable innovation and cooperation among researchers, developers, and industry professionals to overcome.With xdggs we are making a significant step.

Figure 1 .
Figure 1.The classic air temperature data set from the toy weather data tutorial in Xarray, converted and plotted via a hexagonal DGGS.

Figure 2 .
Figure 2. Updated summary boxplots of normalized area values for the cells of open-source DGGSs., including HEALPix.

Figure 3 .
Figure 3. Updated summary boxplots of compactness values for the cells of open-source DGGSs, including HEALPix.

Figure 4 .
Figure 4. Histogram of normalized area values for HEALPix cells.

4 .
Discussion and conclusion 4.1 DGGS as unified grid for EO data

Figure 6 .
Figure 6.Globe view of normalized area and compactness values for HEALPix cells.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVIII-4/W12-2024 FOSS4G (Free and Open Source Software for Geospatial) Europe 2024 -Academic Track, 1-7 July 2024, Tartu, Estonia from a cell id-based index to several global projections, including Mollweide and the Gnomonic projection.Another alternative are high-performance browser-based libraries, such as lonboard, which enables interactive plotting in Jupyter notebooks via deck.gl,which directly supports H3 and S2 cell data.This method is efficient for plotting large DGGS data as it only requires transferring cell IDs (tokens) and cell data, allowing deck.gl to render the cells efficiently in the web browser using the GPU.