Weeknotes: 4th August 2025

I spent last week mostly offline up on Sweden's High Coast, which is a world heritage site.

Ett foto av höga kusten landskapet, med trådar och kvällar, och två par av ben, och på de är två bullar

The high coast gets its name not from it being in northern Sweden, as I'd initially assumed, but because the former coastline is now very high, thanks to the landscape rebounding from the ice age. Like a lot of Europe, this area is still adjusting from being under a lot of ice during the ice age, and the area of high coast is some of the most impacted: this area was under 3km of ice at the time and sank 1km. When the ice thawed the exposed coast line was formed, but since then it has risen about 250m, and you can see on the cliffs and hills in the area where that coast line was.

Last week (I actually was in the office)

LIFE

I spent a bunch of time recovering from a decision past Michael made that at the time had no consequences, but now does. I guess that description covers a lot of things, but this specific one was topical to this blog, I promise.

The LIFE metric calculates the change in extinction risk due to land-use change per area based on taking the impact to a wide range of different species (around 30k terrestrial vertebrates) and summing them. The LIFE team have decided to scale that value by the total number of species included in the calculation, so that in future releases as more species are added it's easier to compare values. For this change then, all I needed to do was work out how many species we used and divide the answer by that value. A simple request, that turned out to be a little challenging.

Firstly, there is the question of when do we consider a species as contributing to the result. As we go through the LIFE pipeline we slowly remove species as being not appropriate, then for some we don't have enough data to calculate values for them, etc. In the end we decided to use the count of per species extinction impact rasters generated as the right count. Thus I added a script to collate that data and then apply it on the final map generation.

However, we noticed that the numbers I got did not match those for another script I'd written that generated global rather than per pixel LIFE values. In theory the method I followed for both paths was the same, but somewhere there was a discrepancy creeping in as there were more species in the raster pipeline than in the global summary pipeline.

After a bunch of debugging, I finally discovered why: multiple ways of handling zero values in my code. In general, if I spot that a species has not enough data to calculate a value I don't write a file out, just add that species to a manifest that lists why it was rejected. But in one case, I was still generating a raster for that species that would then continue through the pipeline. This was (mostly) harmless, as the raster was just full of zeros, so it didn't change the final result, but mean that when I used the number of rasters as a proxy for how many species contributed to that result, I was getting a false positive. It's a simple thing once found, but it's another example of the challenges of these pipelines that process so much data: it only was a few dozen species impacted out of around 30k in total, and so it gets lost too easily if you're not explicitly looking for it.

It's also an example of the evolving way I've had to handle zero data in pipelines like this. Initially we'd just not store data about zero results, as they don't contribute to the end goal, but over time I've learned again and again you need to keep recipts for everything, even things like this. This came to a head particularly whilst I was working on my STAR implementation and was schooled into better practice on tracking this by Chess as we tried to chase discrepancies between my results and hers, and that is why now there is a full species manifest generated as part of the LIFE pipeline (something Chess already did).

Yirgacheffe

Spotted at Höga Kusten Kaffe Rosteri, a small roastery just outside Nordingrå that specialised in Ethiopian beans (the proprietor himself was from Ethiopia): there is no escape from work :)

Which translates as:

"Yirgacheffe comes from the area around the city of Yigalem in the Sidama province. Our Yirgacheffe beans are washed and ecologically produced. Yirgacheffe is our most fruity coffee with notes of blueberry, strawberry, nectarine and a clear hint of lemon. Appreciated for its fullness and balanced acidity with a fantastic mocha flavour and aroma. Considered one of the world's premium coffees.

Our recommendation

Suitable for filter, press, and cold-brew. Drink as it is and enjoy the aroma. We grind the beans as you'd like.""

We got to sample four different beans the roastery had, of which the Yirgacheffe was good, but not my favourite. This means I have a new name ready for my next project...

Continuing my housekeeping work on the LIFE repo from the previous week, I also found a moment to get Yirgacheffe to pass type checking via mypy. Under the hood Yirgacheffe is based on a class hierarchy using inheritance, and some of tricks I'd used there around how I hid GDAL layers were awkward, and whilst technically fine (and covered extensively by tests), caused mypy to throw up its hands, and fixing these looked like quite a challenge when I last considered this a year ago. But it seems either my assumptions were wrong or the code has got cleaner over time (or both), and it turned out getting mypy to pass didn't take me too long, with just a little shuffling of code around the concrete classes and their common base class.

The only hiccup in all this was when CI started failing due to issues with changes in MLX, but thankfully they were very responsive on the issue I filed, and was back in action no time. Turns out they'd made a change to the structure of their package that meant it broke if you did a certain upgrade path, and doing a clean install fixed everything.

I was less successful at continuing to add docs to Yirgacheffe. The way the autodoc works in Sphinx exposes so much of the internal workings of Yirgacheffe that the documentation was unreadable. So now another blocker on a 2.0 release is doing an interna restructure to hide things according to the ad-hoc rules Python has for this. Some of this I can do incrementally, but some of it I can't, and given 2.0 will be an API breaking release where I tidy away a lot of cruft, I might as well do this all at once.

I did make a start on the 2.0 API changes in a branch, starting to add opening methods that are more like those on pandas and Geopandas. I never have been happy with the way opening files works in Yirgacheffe, as it is very Python class based, and whilst Python is an OO language, that's not commonly exposed in data-science code, and so something I want to hide away for the next release.

Baltic Sea Salinity

Last year, whilst on a tourist boat trip through Stockholm's outer archipelago, I was surprised to learn that the Baltic Sea has effectively no tides. Technically this is not true, but practically you'll find that as you sweep around from the Danish straights the tidal influence of the broader ocean diminishes up to the point where there are no listed tide times for Stockholm. There are some small tides in the Baltic Sea, but those are influenced by things other than the moon.

This year I made another discovery (to me) whilst swimming in the Baltic Sea, which is that it is much less salty than the sea water I'm used to swimming in around the UK (predominantly the North Sea). This lead to me finding and reading this paper on The Salinity Dynamics of the Baltic Sea, which I though was really interesting. Again the Danish straights limit the influx of salinated water from the wider ocean system, and so inflow from rivers and precipitation dominate at the surface level. This is then further complicated by the fact that the sea has two layers in it, with the lower salinity inflow from land/air on top and the higher salinity inflow from the North sea at the bottom and they don't readily mix unless certain conditions are met. This is worse yet in the area I was as it's not deep enough to have that second layer. Thus in the Bay of Bothnia where we were swimming the salinity is 4 to 6 g/l vs 34 to 35 g/l in the North Sea.

Anyway, it was a fascinating paper, that now makes me want to print a model of the terrain under the Baltic Sea.

This week

LIFE

I need to generate some updated LIFE maps that use an alternative habitat map as their input based on Thomas Ball's FOOD work. For that Tom generated some habitat maps that have more accurate representation of pasture land, and we want to check the impact of using that versus the regular Jung base map we use.

Terrain printing

Finley's weeknotes continue to be full of exciting progress, and I hope to have some time this week to actually try generating some terrain models with it myself.

SSI

There's another call to the SSI's fellowship program coming up (hat tip to Samantha Wittke on the Nordic-RSE chat channel for sharing that). I meant to apply last time and failed to find time, so I want to try put together something for this iteration.

Tags: yirgacheffe, life

Tech notes by Michael Winston Dales

Weeknotes: 4th August 2025

Last week (I actually was in the office)

LIFE

Yirgacheffe

Baltic Sea Salinity

This week

LIFE

Terrain printing

SSI