Weeknotes: Closing in on Yirgacheffe 2.0

2 Apr 2026

I filed the first issue about making a 2.0 release Yirgacheffe in the middle of last year, and now I finally have a PR for it after a couple of days with the code hatchet.

As it stands, whilst the code churn seems high with 65 files touched and +1,823 -2,744 line changes, it's actually a relatively minor change, as most of that is tidying up from me removing code, and some renaming things. You can see we've lost close to 1000 lines of code and I've added more test coverage at the same time. In the change log I have 3 added things, 2 changed, 1 fix, and 10 removal notices.

Most of the change for the 2.0 release I started adding to the 1.x series since August time, and I've added almost everything I wanted in 2.0 release to the 1.x series, with the prevision that I could not make any changes that would break the existing APIs. This actual 2.0 release is just a few breaking API changes where I wanted to rename a couple of things, and then removing a lot of APIs that have been marked as deprecated for half a year now. I'm really quite pleased that I've been able to turn the ship around like this. Whilst it means the 2.0 new features list looks very lack-lustre, if you could release 1.5 or so as the first 2.0 build, I've been able to add quite a lot without actually needing to do the final breaking release.

When I first posted about Yirgacheffe the AOH code would have looked a little like this:

import json
import numpy as np
from yirgacheffe.layers import RasterLayer, VectorLayer

# load species data
with open("species_info.json") as f:
    species_info = json.load(f)

# Load the geospatial layers: two raster, one polygon based
hab = RasterLayer.layer_from_file("habitats.tif")
elv = RasterLayer.layer_from_file("elevation.tif")
range = VectorLayer.layer_from_file_like("species_data.gpkg", land)

# Find the intersection and restrict the data to just that area
layers = [hab, elv, range]
intersection = Layer.find_intersection()
for layer in layers:
    layer.set_window(intersection)

# AOH calculation
filtered_elv = elv.numpy_apply(lambda chunk: np.logical_and(
    chunk >= species_info.elevation_min, chunk <= species_info.elevation_max))
filtered_hab = hab.numpy_apply(lambda chunk: chunk.isin(species_info.habitats))
aoh = land * filtered_elv * mask

result = RasterLayer.empty_raster_layer_like("aoh.tif", result)
aoh.save(result)

The main win at that point of Yirgacheffe was two-fold: internally it does memory chunking of the inputs, so even though your habitat and elevation rasters are 150 GB each (for example) this will still load fine on a small laptop because Yirgacheffe knows not to load all the data at once. The other being that it will know the geospatial constraints, and so once you've told it about the intersection bit in the middle there, it will do all the correct pixel offset math for you.

But there is still clearly a lot of clutter there.

In this you see a lot of numpy, as Yirgacheffe didn't do much in way of operators itself, it mostly chunked up data and then let you pass that to numpy functions, and you see there's a chunk where we have to specify we're interested in the intersection of the layers, as otherwise we'll be working at a global scale for a species that statistically only occupies a small portion of the planets surface. Since then, Yirgacheffe really is now properly on a par with numpy and other data-science libraries in supporting direct operation on layers, now supporting 50 operators that cover a large set of standard math, numpy style array operators, and even some GPU style convolution work like found in pytorch.

Because Yirgacheffe now does more of the math itself, it also can figure out that for this calculation you'll want an intersection of the layers based on spotting that we're using multiply here, and any area outside the species range will be empty, and zero times something is zero, so not only will it clip the result to that area, it won't even try to calculate those areas outside the range bounding box. Conversely if you adding the layers it knows it'd need to do a union of the layers. For each of those 50 operators it has some logic to work out what this means in terms of how to marry up all the differently sized input layers.

You might also have spotted when we loaded the species range data as a vector layer we had to say it was "like" one of the raster layers so we knew what scale to rasterize it at. You no longer need to do that, again, because Yirgacheffe is doing more of the heavy lifting for you it knows more about your inputs and can infer if you're combining a vector layer with a raster layer then you'll obviously want it rasterized at the same resolution.

So this code now becomes:

import json
import yirgacheffe as yg

# load species data
with open("species_info.json") as f:
    species_info = json.load(f)

# Load the geospatial layers: two raster, one polygon based
with (
    yg.read_raster("habitats.tif") as hab,
    yg.read_raster("elevation.tif") as elv,
    yg.read_shape("species_data.gpkg")as range,
):
    # AOH calculation
    filtered_elv = (elv >= species_info.elevation_min) && (elv <= species_info.elevation_max)
    filtered_hab = hab.isin(species_info.habitats)
    aoh = land * filtered_elv * mask
    aoh.to_geotiff("aoh.tif")

I've got fancy and used the Pythonic "with" style of loading there, but you don't have to, you can just use normal variable assigning there if that's your bag. But the point is that hopefully this code is much more close to the scientific method and not cluttered with the challenges of getting the computer to do the job you need it to.

To anyone who read the PROPL paper on Yirgacheffe, which was published a few months ago now, that example hasn't change since then, but that's because I was already well on the way to 2.0 at that point. But the point is that by leaning into Yirgacheffe being a declarative library where you do all the math on abstract geospatial layers, leaning more into the user never thinking about pixels or alignment offsets, by giving more burden to Yirgacheffe, it because possible for it to do even more to help the user: a great positive feedback loop.

Since the PROPL paper I've added a way to have a dynamic reprojection of a raster to another map projection, so if you have two rasters in different projections you want to combine, you can now just do:

with (
    yg.read_raster("layer_in_wgs84.tif") as layer1,
    yg.read_raster_like("layer_in_mollweide.tif", layer1) as layer2,
)
    res = layer1 * layer2
    res.to_geotiff("result_in_wgs84.tif")

That said, you want to be careful making this sort of thing too easy, as it hides an expensive operation, and so if you're doing this often you're better translating it once beforehand using something like gdalwarp. Also, if these are at different pixel densities, perhaps it's not obvious which one will be more lossy and you'd be better reprojecting the other way around. I can see a world in which a future Yirgacheffe perhaps tries to take a more active roll in these. But this is the evolution of Yirgacheffe APIs: I start with something simple and then make it smarter as it works its way into common usage.

The fun now is I believe in Yirgacheffe so much, I've started tripping up when I forget what most people would be consider stupid things to try. For instance, I recently hit an issue where I simplified the species richness calculation in my AOH library to be:

from pathlib import Path
import yirgacheffe as yg

species_aoh_rasters_paths = Path("/where/the/geotiffs/are/").glob("**/*.tif")
species_aoh_rasters = [yg.read_raster(x) for x in lots_of_species_aoh_rasters_paths]

# rasters are fractional area per pixel, and we just need a binary here/not here layer
binary_aoh_rasters = [x > 0 for x in species_aoh_rasters]

# Then add them all up to get number fo species in each pixel
species_richness = yg.sum(binary_aoh_rasters)
species_richness.to_geotiff("species_richness.tif", parallelism=True)

This worked fine, until I pointed it at my attempt to do a species for all 34365 terrestrial vertebrates on the IUCN Red List and ran into a limit put on GDAL by macOS that meant I couldn't open more than 32000 files at a time. Was it perhaps silly of me to try read 32765 files at once? Maybe, though I'd say not on today's super powerful computers, but the point is that was the only thing that got in the way of this working. Not having to align 34000 rasters, not needing to juggle the memory required for them, not trying to work out how to best use the CPU or the GPU for this. All that other stuff Just Works™ for me now, and so I then forget that and bump into things like this.

So yes, it's a silly problem to have, but it's also a win that these days that's the kinda of thing I'm having to solve because other problems have gone away!

Thus it'll be nice to park Yirgacheffe for a while at 2.0 and let it mature, and see how it is applied and what frustrations others have, as it now works for most of what I need it to do. 2.0 is everything I ever wanted in a GIS library, but I very much doubt it is everything for others, but now that's the input I need to motivate for work for it.

A nice, and in my experience rare, place to get to with a project.

Tags: yirgacheffe

Tech notes by Michael Winston Dales

Weeknotes: Closing in on Yirgacheffe 2.0