Weeknotes: 3rd November 2025

Last week

Yirgacheffe 1.10

Those old enough might recall Mac OS X Snow Leopard, which was unusual as a release of the OS in that it promised "zero new features". Instead Apple were going to spend time on performance improvements and bug fixes, which was as novel as concept at the time as it is today. At the time I was working at a small start-up, building a digital-signage product called CODA[1]. We worked on a three sprint cycle whereby sales/marketing got to drive the deliverables of two out of three sprints, and we in the tech team got to drive the remaining sprint’s agenda. These sprints thus became known as Snow CODA releases.

This week I released the 1.10 release of Yirgacheffe, of which I’m quite pleased, and is definitely a Snow Yirgacheffe release[2]. I’m tempted to say that’s despite there being no user facing new features, but perhaps I should say that’s because there’s nothing seemingly new here to the users, and that perhaps reflects a level of maturity in all the new work I’ve been trying to add this last half year as I turned Yirgacheffe around from being a library for a computer scientist trying to do geospatial work to a library for data scientists doing geospatial work.

So what is in this release that I’m so proud of?

Firstly, I added common subexpression elimination (CSE), a technique from programming language implementation that detects when the same work is being done multiple times and just does it once instead. This is something I’ve wanted to do for a while, as an example of this problem is hidden in the most common example I give of using Yirgacheffe, calculating an Area of Habitat for a species. The code for that looks a little like this:

import yirgacheffe as yg

with (
    yg.read_raster("habitats.tif") as habitat_map,
    yg.read_raster("elevation.tif") as elevation_map,
    yg.read_raster("area-per-pixel.tif") as area_per_pixel_map,
    yg.read_shape("species.geojson") as range_map,
):
    refined_habitat = habitat_map.isin([...species habitat codes...])
    refined_elevation = (elevation_map >= species_min) & (elevation_map <= species_max)
    aoh = refined_habitat * refined_elevation * range_polygon * area_per_pixel_map
    print(f'Area of habitat: {aoh.sum()}')

You can see in the middle there to calculate the refined elevation for a species we find all the areas where the elevation map is over a certain the species lower elevation preference and the area where the elevation map is under the species upper elevation preference and then take the logical and of the two, to find the areas in the elevation map where the species might like to be. In doing so Yirgacheffe actually is reading the elevation map twice, once for each half of the equation. This seems not that significant, but for a species like a Moose or a Bear that can occur over most the northern hemisphere, that can involve reading in quite a lot of data, which involves doing LZW decompression on the GeoTIFF twice, etc. That said, because I use GDAL under the hood for reading GeoTIFF data, and it does do some level of caching, this has never been a big enough annoyance for me to fix it.

However, recently I talked about how I implemented edge effects on AOHs, where in we try to take into consideration the fact that species don’t always like to go all the way to the edges of their preferred habitat types. It turns out doing this with sub-pixel accuracy required quite a lot more nuance in the calculation, and was full of common subexpressions, in particular, not just reading of data, but re-using results that have already gone through several steps of processing, all of which was being re-calculated each time it occurred in the expression.

Rather than throwing the code at you (though you can read it here if interested), it's perhaps easier to understand diagrammatically. Here is the above AOH code rendered as a syntax tree:

A diagram of an expression syntax tree showing a hierarchical computation structure with MUL (multiply) operations at the top levels combining area-per-pixel.tif and species.geojson files, followed by logical operations including ISIN testing elements against habitat.tif, and AND operations combining GE (greater than or equal) and LE (less than or equal) comparisons with elevation.tif bounded by values 0 and 1054.

Hopefully you can relate the diagram to the code snippet above, and you can see it shows where we have a common subexpression, because we can see two arrows going into the "elevation.tif" node, showing two things depend on it.

This here then is the syntax tree for the edge effected AOH calculation, which you can see both contains a lot more nodes, but many more that have multiple inputs on the top side, indicating that the result is being consumed by multiple other nodes:

A diagram showing a complex expression syntax tree with two main sections labeled A and B. Section A (top) shows a simpler tree with MUL operations combining area-per-pixel.tif and species.geojson, with WHERE, GT, and LT operations filtering by elevation and constant values. Section B (bottom) shows a much larger optimized tree structure with SUB, ADD, MUL, WHERE, RSUB operations, multiple CONV2D operations with weight matrices, ASTYPE data type conversion, and ISIN operations testing elements against habitat.tif, demonstrating a transformation from the simpler expression in A to a more computationally efficient form in B.

In particular, note the "MUL" node labelled A that is a narrow point in the diagram about 1/3rd the way down: this is consumed twice, and has most the computation below it, meaning that without CSE Yiracheffe was calculating that entire thing twice! Another interesting node is the "ASTYPE" node labelled B near the bottom, which sits above a filtering "ISIN" operation on the habitat map: this is consumed 8 times, and without CSE we'd calculate the filter 8 times! So now you can start to see how by implementing CSE, I was able to get the edge effect code to run between 4 to 5 times faster.

There was a couple of gotchyas I hit that are specific to how Yirgacheffe works, the main one being the impact of convolution matrix processing, the nodes marked "CONV2D" in the large graph. For each pixel in the result raster, we need to read from the source raster all the pixes under the matrix placed in the corresponding position, and for the 3x3 matrixes used here, that means we need to read in a larger area by 1 pixel on all sides. So that means the line that goes from all the CONV2D nodes to the ASTYPE node need more data than the line that goes from that MUL near the top, and so those aren't actually using the same data, and can't be treated the same. So in fact we still need to calculate the filter twice, once for all the CONV2D instances and once for the MUL. In theory the later use case is a subset of the former, and so I could figure that out, but that added yet more complexity and I felt I'd rather leave that performance on the table for now, and come back to it once the current implementation has some miles on it.

But overall, this is a great step forward for Yirgacheffe, and the start of what I hope will be more under-the-hood improvements performance wise. At some point I want to try optimise the graph, looking for things like constants that will zero out bits of the graph, meaning I don't need to calculate them, or being able to simplify expressions in a way that mean we don't process so much data.


The other improvement I got into 1.10 was fixing a bit of API friction due to my own ignorance of Python inner workings. Yirgacheffe since 1.0 has overloaded the __add__ operator (along with other basic math operators), so that you can write:

import yirgacheffe as yg

with (
    yg.read_raster("1.tif") as layer1,
    yg.read_raster("2.tif") as layer2,
):
    res = layer1 + layer2
    res.to_geotiff("summed.tif")

Nice. You can also add a layer with a constant:

import yirgacheffe as yg

with yg.read_raster("1.tif") as layer1:
    res = layer1 + 42
    res.to_geotiff("incremented.tif")

Great. However, in order to swap those operators (important for non-commutative examples like subtract and division), you needed to annotate the constant like so:

import yirgacheffe as yg

with yg.read_raster("1.tif") as layer1:
    res = yg.constant(42) + layer1
    res.to_geotiff("incremented.tif")

This is because under the hood Python takes lhs + rhs and turns it into lhs.__add__(rhs), and whilst Yirgacheffe types know what to do when a basic Python value like an int or a float is passed to their __add__ method, the inverse is not true, and Python types thrown an error if you do what is effectively 42.__add__(layer1), as they have no idea what to do with Yirgacheffe layers.

I realised that Python numerical library numpy somehow made this work though, and it turns out Python has a clever trick here. If __add__ fails, it'll try calling __radd__ with the operators reversed, so 42.__add__(layer1) is called as layer1.__radd__(42). Then we're back into a world I can control, and I can make that work.

The long and short is that finally you can now write:

import yirgacheffe as yg

with yg.read_raster("1.tif") as layer1:
    res = 42 + layer1
    res.to_geotiff("incremented.tif")

Which seems trivial, but again the edge effect code was full of 1 - layer type things, and all the yg.constant annotations distracted from what the code was trying to do, making it harder for readers to understand what is already a much more complex AOH equation than the standard one. The point of Yirgacheffe is to do the heavy lifting for the user, so I'm super pleased this is fixed, if slightly embarrased I missed this trick for so long.

LIFE

We had a LIFE planning meeting, working out what the team will attempt for the next 6 to 12 months. I'm aware that I've been a bit of a bottleneck at times on this, so I was pleased that I managed to not pick up yet more tasks. It was good to see there are lots of things people in the group want to do pushing forward, and I'll focus on a few things like edge effects and habitat maps where I can do the mechanism that then enables the ecologists to do their bit betterer.

Edge effects

Actually on the edge effects work, Ali and I sat down and worked out how one might integrate edge effect AOHs into LIFE, as it's not very straight forward. In the end I think we convinced ourselves it is possible, but is potentially expensive to compute, so I need to go thing about that it bit and see if my instinct is right or not.

This week

  • Habitat map testing - I want to do multiple runs of the pasture map work we did a month or so back with different random seeds to see how stable the outputs are.
  • Apply the validation method that I wrote a while ago and that Shane has been studying to AOHs generated with the above maps to see if that shows issues with them, and as a way of actually kicking the tyres of my validation code.
  • If I have time I'd like to try tackle a bit of hacking on the OCaml code that runs this website, as others in the group are trying out some new ideas, and it'd be fun to keep pace if I can. If nothing else I should fix the CSS for the footnotes...
  1. Still in use today as far as I'm aware, almost 20 years later.

    ↩︎︎
  2. Which is a very stretched metaphor, as based on a quick look around, the mean temperature in Yirgacheffe varies between 15˚C and 20˚C according to this paper, so it probably doesn't see much snow.

    ↩︎︎

Tags: yirgacheffe, life, aoh