Weeknotes: 21st July 2025

Last week

LIFE and Yirgacheffe

Between my wanting to do a new revision of the LIFE biodiversity maps using more recent input data, and some talk in the LIFE team of a new potential paper, I spent a chunk of the week tweaking the LIFE pipeline.

The initial motivation was to add support for a couple of other land-use change scenarios that we'd been looking at along side doing an update for the both the IUCN Redlist and the Jung habitat layers. Whilst doing all this I realised a lot of the early stages of pipeline were written against much earlier versions of Yirgacheffe, my declarative geospatial library, and so were much more verbose than they needed to be. Thus a day or so of spring cleaning commenced.

The benefit of updating code to newer Yirgacheffe is just an extension of why Yirgacheffe is, to my mind, a good thing. Yirgacheffe has pretty good test coverage, so moving as much complexity from the method code into Yirgacheffe as possible means the code is more likely to be correct, particularly as the resulting method code is much simpler. When I wrote the original LIFE pipeline Yirgacheffe was still evolving, and I had to make heavy use of the numpy escape hatch I'd built in, which involves Python lambdas etc. So in doing the tidying I end up a more clear expression of intent in the LIFE python code, and a sense of better robustness overall.

That said, doing changes like this always makes me nervous, as large data-science projects have this habit of a simple mistake getting lost in the shear volume of data you're processing, only to cause heartache when you discover it months later.

Yirgacheffe already has good test coverage, which helps, and both Yirgacheffe and LIFE pipelines have pylint ran automatically on any code changes, but to help even more, I added mypy type checking to the LIFE pipeline alongside pylint, and I also brought the test suite for Yirgacheffe under pylint also. None of this makes mistakes impossible, but hopefully it makes them less probable.

In trying to add the new scenarios I did spot a shortcoming in Yirgacheffe in that if you used one of the constant number layers on its own (say you wanted to make a raster that was all 1's) and tried to save that then you'd crash and burn, as there was an assumption withing Yirgacheffe that input layers would always have geospatial dimensions, and constant number layers, where every pixel has the same value, do not. However, it was easy enough to spot this was happening and in this particular corner case take the dimensions from the layer you're saving to and use those instead.

In adding more LIFE scenarios, it was also time to retire from the runner script some of the scenarios from the original paper that were used to prove a point in that specific paper, but are not generally needed any more. People can still run them by hand if they like, but when the pipeline like this has a total runtime measured in days, keeping it to a sensible working set of scenarios is important. I'd still like to replace the runner script with something like Bon in a Box, but this wasn't the time for that.

With that done I did a complete run of LIFE for all the new scenarios, which another day - given how much code I'd touched I was nursing this run by hand so I could catch any errors early. Here's a species richness map that is one of my intermediary tests of whether the pipeline is generating anything sensible.

A map of the earth showing all the seas as white, and the land as mostly close to black with lighter patches where there are the most species in once place, which for this dataset of terrestrial vertebrates is between the tropics, particularly in the Amazon.

This species richness map just shows the number of species in each pixel, and is for the particular set of terrestrial vertebrates that are selected from the IUCN Redlist by the LIFE methodology. Whilst a course view over the roughly 30k species in the pipeline, it does show that we're getting a sensible distribution with most biodiversity being between the topic lines.

Thankfully the entire LIFE run with the new scenarios added seemed to run through fine, and now I need to review them with Ali this coming week.

3D-printing maps

Finley continues to make progress with the maps, and has started keeping weeknotes, so I won't spoil them too much here. But it was exciting that we got our first prints made from his models at Makespace:

A photo of a small 3D-print, about 5cm per side, of some hilly landscape with what looks like rivers

We also were inducted on the Prusa 3D-printers that the computer lab has, which is great as it frees up Finley to make test prints without having to wait for me to be free. The only downside is that the Prusa set-up at the lab doesn't support multi-colour printing (they have an AMS, but it's not been reliable enough for them to want to support it), and so for those prints we'll still need to use the printers in Makespace.

Other things

I had some fun discussions with Shreya and Patrick about how to get packages into opam, and we had a general Outreachy catchup call between the OCaml projects, which was good. I sat in for a bit of the Tessera foundation model workshop which was interesting, and had a good catch up with David Coomes on some aspects of the plant project and about how I could help one of his summer students. I started looking at how numpy generate their documentation as I'd like to do something similar with Yirgacheffe, and that made me sad. I'm sure there's more things in there, but having spent a weekend dealing with a vehicle break down, any more details have currently vacated my mind :)

This week

I need to add the final changes to the LIFE update, and we want to now published layers with the value scaled by the number of input species, to make it more comparable to different runs as the red list continues to add more and more species.
Make sure the LIFE team have all the data they want this week, as I'm away on PTO next week
Try out the lab Prusa printers
Make sure both Shreya and Finley have all they need for next week when I'm away also