AOH calculator 2.0, a new LIFE publication, and STAR pipeline updates

AOH calculator

I've spent the last couple of years almost maintaining two biodiversity pipelines: first the one for the LIFE metric made by a group out of the University by which I'm employed, which analyses the impact of land use change on species extinction rates, and then off the back of that I build a pipeline for the IUCN's STAR metric, which look's at how different treat categories against species are spatially distributed, so you can work out for a given project area what are the challenges that need addressed and have some weighting to the priorities of said challenges.

Both these pipelines are built upon the notion of a species' Area of Habitat, which is a way to calculate where a given species is likely to live based on a combination of roughly where we know a species to live and how it's preferences for habitat types and elevation matches up with the high resolution data about such things we have today.

These pipelines are built in Python, and like any good software engineer, I pulled out the common code for the AOH calculation into its own package, which I then published via Pypi for others to use. However, it's always nagged me that the package is actually quite specialised to the two metrics down to performance reasons, and any time I need to do some simpler AOH work I end up re-implementing it (the core of the method isn't that complicated, and is quite easy to knock up quickly with Yirgacheffe).

The specialisation stems from the fact that both STAR and LIFE require the processing of tens of thousands to hundreds of thousands of species data, and the input maps we use are at 100m resolution, and this leads to needing to process petabytes of data, which then is downsampled before publication. In a workshop back in 2024 I sat down with the key folk working on STAR and we agreed that actually we could downsample early in this context safely and gain significant performance benefits from doing so, and that's basically why my AOH package has been a bit cumbersome to use: it's optimised to make life easier for these heavy pipelines, but at the expense of just needing to make a few simple AOHs using off the shelf data expensive.


Looking at my near term todos, I have some work on both STAR and LIFE that I need to do, and both of them required me touching the AOH code and breaking the existing APIs in some small way, so I used this as an opportunity to address that nagging concern about the APIs being overly optimised: if I'm going to have to make a new major version, I might as well fix this thing that has vexed me for so long.

To that end I have a pull request made that means the AOH package will now do simple "binary" AOHs along with fractional/proportional AOHs, and works with both elevation data that has been downsampled to min/max layers for the big pipelines and just a regular digital elevation map (DEM). I also have added a bunch of documentation, and removed some special cases whereby I had added a feature for LIFE that I was abusing to solve something else for STAR.

It's kinda a week of work that no one will be happy about (it doesn't actually move me on with either of my outstanding tasks on STAR or LIFE), but will make the life of others in the future better hopefully, even if that's just the people who maintain these pipelines after I move on to other things one day.

LIFE Case Studies paper

The LIFE crew (which includes myself) have a new publication out: Informing conservation problems and actions using an indicator of extinction risk: A detailed assessment of applying the LIFE metric by Eyres et al, published in the Journal on Biological Conservation. This paper, which was mostly Alison's work, looks at having designed a biodiversity metric, how does it actually apply to real world analysis? Alison looks at five different case studies of attempting to apply the LIFE metric and seeing where it works and where it doesn't. Some of the "where it doesn't" was where I contributed, helping Alison do root cause analysis on why the LIFE metric wasn't matching on ground observations. The tl;dr on that is basically the input layers we use have errors in them too in some areas, and those obviously impact our outputs.

This motivates something I think we need to look more into, and something I hope we as a broader group in the CS side of things will start to tackle this year is how do you make this kind of spelunking through these large data pipelines easier to do? I was able to navigate all the data that goes into the metric somewhat easily as I have it all in an SQL database and I know how to write spacial queries in that, and I had all the individual AOHs still on disk so I can track the malformed data as it flows through the pipeline, but it's not an easy task, and if you're not in possession of a full working model of the pipeline in your head, both the code and the data, then it has to be a very daunting undertaking. But all the information is somewhere in the pipeline, we just choose to throw it away. This relates to Patrick's PhD topic, and my silly experiments with hacking provenance tracing into Python programs, but it's a very real problem when building these systems.

STAR pipeline updates

That sort of tracking isn't just for papers either. I often compare the output of my STAR pipeline with the official IUCN pipeline by Franchesca Ridley (aka Chess), and if we differ at all, then I have to do a similar level of spelunking to understand why.

This week I've been updating my STAR pipeline to use new input layers Chess has prepared for publication alongside the upcoming STAR update publication (currently in review), and once again I have a sense of fear and foreboding about such updates, and it's because I know that there's a risk that my new results won't match, and then I have to don my spelunking hat and start following numbers through a lot of code.

Thankfully though the causes of the updated STAR run failing where easier to ascertain this time: I need to properly propagate the breaking API changes I made for the AOH package, which at least gives me a gentle start to the new week fixing that.

Tags: weeknotes, aoh, python