Weeknotes: Fractional LIFE progress

4 Mar 2026

I’ve spent the last few weeks generating new versions of the LIFE biodiversity metric maps:

Original Hybrid
Original Hybrid

These maps are an extension of the LIFE maps from the original Eyres et al paper which were then improved on for the follow on by Ball et al via the inclusion of more accurate farming data. This latest iteration I've been working on for the last couple of months takes that improved map further by refining how we integrate the farming data. It’s quite a significant update to the maps as the original maps are based of the Jung et al habitat map which lacked a lot of agricultural data (acknowledged by the original authors), and so by including more accurate farming land use data we can much better see the pressures on extinction risk of land use change, as there’s even less undisturbed areas than we originally calculated.

Original Hybrid
Original Hybrid

From a geospatial data-science point of view this work has also been quite interesting, as for the first time the full pipeline has been updated to work with fractional pixel values. Before the input habitat layers were high resolution (100m) rasters where each pixel was assigned a fixed land usage: forest, farm, desert, lake, etc. But because of the probabilistic nature of trying to integrate the farming data, the input has been split out to a raster per habitat type, and the value in each pixel is a percentage of how much that cell is believed to be of that type.

We already did this for the Area Of Habitat (AOH) calculation within the LIFE pipeline, as the fractional layers are super useful when downsampling high resolution data to something less bulky for computing with. If you go from say a 100m raster where every pixel is of a known habitat type to a 1km map, having fractional rasters is a great way to say “within this 1km square 40% was farm, 10% was lake...” and so forth. But by now propagating this fractional approach all the way to the input layers, we can start to do interesting things like adding in data where the location is less certain for instance, or indeed just we can take much higher resolution data, like the Brazil Mapbiomass data which is at 10m, and turn that into a factional input layer that aligns with our other 100m data.


Updating the LIFE pipeline to use this method is in theory not that hard, as it's a minor change, but it had a lot of ripple through consequences that I needed to keep track of, which is a pain of working with these large data-science pipelines, particularly changes at the early input stages. Not only does it mean I touch a little bit of code in many places, but re-running things each time is quite slow also, making it challenging to keep the mental state of what you've done so far consistent.

To compound things, a lot of the LIFE code is now a couple of years old, and since then I've made major improvements to Yirgacheffe, the geospatial library on which it is built. In some places that meant I could tidy up the syntax used to invoke Yirgacheffe as I made the API more closer to Numpy or Pandas (other data-science libraries), and in other places I was able to replace entire files with one call to Yirgacheffe (Yirgacheffe now will sum a batch of rasters for you in parallel, whereas before I had a couple hundred lines of code to do that).

With all this change, it was innevitable that something would go wrong at some point, and on my first attempt to run the full pipeline I spotted that for most of Europe you could now see the coastline faintly, which you could not in the previous attempts:

A screenshot of an artificially coloured map showing areas of Scotland. Most of it is white, both the land and the sea, but you can see a few bits in the central high lands are coloured pink and black, but also the entire coastline of Scotland is coloured pink.

I was worried that it was a bug in how I integrated the new farmland data. One of the tricks we have to do is squash the changes suggested by the alternative datasets into eligible land. The farming data is at 10km resolution, and I'm trying to distribute that over a map that has 100m per pixel. But I can't just use any old 100m pixel with each 10km cell: I need to avoid turning cities and lakes into farms, and instead try to pick from grasslands or forests, which are much more likely to have been misclassified originally. So my initial assumption was perhaps I'd failed to scale the data properly and near the sea we try to shoe-horn 100km^2 of farm into some small fraction of beach.

However, I realised that if that was the case, we'd see it more globally, and not just for Europe. The fact that it was so regional implied to me that it was actually species related, and there was some issue I was triggering related to some animal for that region. Given it's coastal, I wondered if I was accidentally marking some extra area as marine, and so searched for species that liked marine and were found in Europe, and quickly narrowed it down to the Black Headed Gull. Looking at the data for the Black Headed Gull, I realised it shouldn't be included in LIFE at all, which is why it wasn't in the previous LIFE maps. What I'd assumed was a bug, was in fact some bad data hygiene by yours truly: in September last year one of the researchers asked me to do a run with a few extra bird species in, and I'd not removed them from my overrides list once complete. Doh! So it wasn't a bug at all, just human error with data management.

Still, I bring all this up not just to show once again that I'm as good as anyone else at making mistakes, but also it highlights one of the fun things about working in this sort of domain. Quite often the issues I have with my code I can track down to a particular species or a particular part of the world. Normally in computer science you're fighting against abstract concepts, but here I'm being challenged by a bird that I'd probably see the next time I walked into town from where I live. It's wonderful after all these years working with decidedly tangible things, even if it is in a computer model.

Tags: weeknotes, life, yirgacheffe, aoh