Weeknotes: Zarr and GeoZarr

May 8, 2026

I work in the same group as the folk producing the Tessera foundation model. Whilst the foundation model itself is exciting, one of the things I've been trying to get my head around is how they're using the Zarr data format for the underlying storage, rather than the more common GeoTIFF format. I find GeoTIFF frustrating to use for a number of reasons, and so was curious to learn more about the benefits of Zarr over the more established format.

Zarr

Zarr itself isn't anything to do with geospatial, it is rather a container format for storing a set of multi-dimensional arrays, optionally with some hierarchy to help give some structure to the collection.

Multi-dimensional arrays

Given people are familiar with raster/bitmap images as a way of storing data, let's start from there. In terms of image data we typically have a single two dimensional array. In TIFF you can actually store multiple "bands", which is to say multiple images of the same width and height. This is actually how true colour images work: there are three greyscale images stored, one per red, green, and blue. Or, we can say that in another way that TIFF as a file format allows for three dimensional arrays. You can have up to 2^16 bands in a TIFF image, so you can store quite a few different bands beyond the three you find in colour images. With GeoTIFFs in biodiversity work we'll often use this to say split out results by taxa, or store a layer per land cover class, etc.

In TIFF you can't go beyond the limits of three dimensions of data, but with Zarr you can store proper n-dimensional arrays. Tessera is a global, temporal, foundational model with 128 "embeddings" - which is just to say they need 4-dimensions in their data: x and y for location on planet, another for the year of observation, and then 1 layer per embedding. To do this with GeoTIFF they'd have to pick one of those later two dimensions and just have a file per year or file per embedding.

Multi-multi-dimensional arrays

In fact, you don't just need to stick to one multi-dimensional array, you can store multiple of them. The reason you would want to do this over adding more dimensions to an existing array is if you have related data that doesn't actually have the same size dimensions.

The most obvious example of this in geospatial work would be if you wanted to store data at multiple zoom levels. In theory you could have a 3-dimensional array where you have x, y, and zoom level, this would be wasteful, because only the most zoomed in layer would be filled, and each other layer would be increasingly full of empty space, but that empty space has to be accounted for because the n-dimensional arrays are of a fixed size on all axis.

Instead, it's more efficient to have a distinct array per zoom level. You can actually do this with TIFF files, where you can embed multiple distinct images within a single TIFF file, which effectively gives you multiple unrelated two or three dimensional arrays. This is how Cloud Optimised GeoTIFFs work when they support multiple resolutions.

Xarrays

It's not part of the Zarr standard here, but it seems common practice and it informs what comes next, so I need to drag in xarrays here. Let's imagine we have a 4 dimensional array of data from the Sentinel-2 satellite observation mission, where we have x and y to locate a pixel on the surface of the globe, another dimension for year of capture, and finally another dimension being the wavelength band.

If I was using just a multi-dimensional array to look up this data I'd have to write:

value = satellite_data[100, 100, 3, 7]

Which is a bit confusing to read. So with xarray you define a 1-dimensional array for each axis that maps values to each location: for the year axis we can map 0=2017, 1=2018, and so on, and similarly we can assign names to the wavelength bands, and so that query can be something like:

value = satellite_data.sel(
    x=500000.0,
    y=4800000.0,
    time=2022,
    band="near_infrared"
)

Whilst this might seem a bit semantic, it means the code better matches the intent now - it's clear what year is being used and what band is being used and I can't accidentally swap the 3 and 7 in the earlier version and not notice.

Xarray isn't part of the Zarr specification, but Zarr is a good way to store xarrays. In this example we need to store a single 4-dimensional array and four 1-dimensional arrays (one mapping per axis), and with Zarr we can store all those together. It's so popular in fact, that this then motivates use of the next feature: groups.

Groups

Groups are like folders in your file system. Within a Zarr file you can create groups of arrays, and even groups of groups if you want.

At first I was confused by groups, as in all the examples I saw there was a distinct dataset used per group, rather than putting multiple datasets as distinct arrays within the root level of the Zarr file or in a single group. For example, we used the zoom levels above as a motivator for multiple multi-dimensional arrays within a Zarr file. In practice, what you find is people using a group per zoom layer, which confused me, until I understood how popular xarrays are!

What happens is that at each zoom level you'll need a new x and y axis index array, because as you zoom in each step size is different, and you'll have a different number of steps. But every zoom level wants to have an "x" index and not "x_at_zoom_level_123", and the solution for that is just to push each zoom level into a group where each zoom level can then have its own "x" and "y" index array, and we avoid an annoying namespace clash. This isn't the only motivator for groups, but it does feel like the common reason you see groups being used with one "primary" array in them - it's because the index arrays all probably have the same names in each group.

Tessera does a neat trick with groups. Tessera doesn't using a single map projection for the entire globe, rather it uses the UTM projection system where the globe is split into a series of chunks each with their own local projection from lat/lng to pixel coordinate, which provides better local accuracy. The downside of UTM is that it means a map of the UK will not align with a map of Brazil, say. To let it get away with this Tessera uses a group per UTM region, so all the 4-dimensional arrays are kept in the most accurate projection they can. This to me seems quite a neat use of the groups, and I need to do some number crunching to work out how that compares with using other single-image equal-area-per-pixel projections like Mollweide which is my current go to projection.

A quick note on what you can change

So far when I talk about why you'd want to have multiple datasets within a single Zarr file it's mostly been about changing the dimension sizes of the array, but you can also change other details within a single Zarr file. You can have more or fewer dimensions (as shown by the index arrays): perhaps have one greyscale dataset, another be an RGB visual colour version of the same data, and then another being raw hyper-spectral satellite data with dozens of wavelengths. Or you could store different datatypes of between arrays, perhaps different accuracy of floating point value depending on if you want a true reference high accuracy using float64 or a float16 version to feed to a GPU.

Technically you can also vary multiple of these (type, dimensionality, size of dimensions) per group: Zarr doesn't care, it's just a container format. To make sense of it requires someone to document what's in there, and the more you mess around the harder that is. But we'll cover that more shortly.

Chunks

If you've ever played with putting maps onto a website, you'll probably be familiar with the concept of tiling. What happens is you have multiple images the map you're trying to show, at different zoom levels. So the first image is say 100x100 image of the entire world, then you have a 200x200, 400x400, and so forth. The problem as your images get bigger and bigger is that's a lot of data to ship to the browser looking at the map, when they're only going to be looking at a small area as they zoom in. To save people downloading full global resolution maps when they zoom in on Skegness to see where they might go for holiday, the image data is split into tiles. These tiles are typically always the same size as the top level image, so in our example we'd have a 100x100 pixel image when most zoomed out, then we'd have 4 100x100 pixel images that form a 2x2 grid, then 16 100x100 images, and so forth. This scheme requires a little extra plumbing but makes accessing random bits of the globe a lot faster and leaner.

Internally Zarr does something similar. When you create a Zarr file (at least when using the Python bindings), you have to specify the chunk size on each dimension you're creating, and then whilst you work with the abstraction of a full sized multi-dimensional array, internally Zarr will be finding the chunks that you want as you read/write bits of the array space.

If you look at a Zarr output, it's actually stored on disk as a directory, in which you will find groups as top level folders, and then within them a hierarchical storage of chunks. This is aimed to make random access easy, but isn't very portable compared to a single file like with GeoTIFF. There is a zip store version, which you can use to have a single resulting Zarr file, but I believe that comes at some performance cost, particularly for writing.

On one hand, you can use Zarr without needing to know this, but like the tiling zooming images example, how you specify the chunk sizes will have an impact on the performance of your Zarr file in terms of access and file size, so it's something you'll likely want to tweak based on benchmarking for you particular application.

GeoZarr

The Zarr file format is deliberately a generic container format with a lot of flexibility, and as we've seen you can store things in many different ways, and so how do you know when you get a Zarr file what's in there?

The solution is a bit like how GeoTIFF is built upon the TIFF format. The TIFF file specification says nothing about storing geospatial constraints on the data inside a TIFF file, but it does have a flexible metadata section which lets you specify additional properties about the file. Technically these are just additional "tags" - which is why it's called Tagged Image File Format^[1], the same as width and height and colour type are also stored in tags in the file. So the GeoTIFF standard defines a dozen or so extra tags that indicate the map projection used, the spacial extent of the area covered by the raster, and so forth.

Zarr is even more flexible than TIFF. With TIFF there is some base assumption that your first two dimensions will be x and y within an image space, but all Zarr says is that you have a collection of multi-dimensional arrays, and then defines the Zarr Adoptions Conventions which is a sort of standardisation for different ways to interpret Zarr files for different use cases. For example, there's a set of standards defined for Open Microscopy data, and then another called GeoZarr that actually pulls together three smaller conventions into one.

GeoZarr is based on:

geo-proj: a standard metadata way to express the coordinate system used by the stored geospatial data
spatial: a way to define which bit of the plant you happen to be looking at
multiscales: a way to define groups as different zoom levels when displaying data

How do you tell which conventions are used in a given Zarr file? Again, there's a convention for that, where you can list the standards you're following in the metadata, as you can see in this example. This isn't a mandatory part of the Zarr standard as far as I can tell though.

What I've not figured out yet is how you can have your own grouping, like Tessera does, and support multiscales grouping. In theory it's all good as you can have nested groups, but reading the multiscales standard I couldn't see if you'd have each UTM group have multiple multiscales subgroups, or multiscales should be at the top level - I need to dig into the standard a bit more here.

Wrapping up

This was just a brain dump of what I've learned kicking the tires on Zarr. On one practical note I did find that the compression to be pretty good: I took 34k species rasters I had in individual GeoTIFFs, totally 46 GB, and when converted to a single Zarr store they were just 32 GB. Nice, but it did take the better part of 5 hours to do the conversion - but that was without making any attempt to optimise the chunk size.

My interest in this is whether I can use Zarr/GeoZarr to be more like GPKG is: a single container to store multiple results in species processing pipelines. If that makes sense, then I could exploit this in my declarative geospatial library Yirgacheffe as a way to let me express parallelism not in code, but basically inferred from the storage format on disk: if your input to an equation happens to contain many layers, then actually do the same calculation for all layers and write out another multi-band result. Then Yirgacheffe can parallelise that internally to ensure optimal performance because it has a more holistic view than it current has when fed one species at a time.

Oddly enough, I recall TIFF being short for "Tagged Image File Format" from my youth, and Wikipedia also corroborates this, but the actual TIFF specification never explains what TIFF means, if anything.
↩︎︎

Tags: weeknotes, zarr, xarray, yirgacheffe

Tech notes by Michael Winston Dales