Weeknotes: 15th December 2025

I wonder if next year I need to alter my approach to weeknotes, as they're getting the point where some sections could be their own post, and I have to cut them short due to time (today's week notes took about half a day to write, which isn't sustainable) and consideration for reader patience. Perhaps somewhere between what I've been doing, and Mark Elvars almost day notes. I don't think I have the energy to do actual day notes, but more frequent shorter notes of sorts might suit things better, and make it easier to post things on Rogue Scholar where I think one topic per post fits better. I'd still be posting incomplete work, so in update rather than completion form though. Thoughts welcome.

Last week

Validation meeting

Coming as I do from a software engineering background, and having been schooled the hard way about the importance of automated testing, the absence of it from data-science pipelines doesn't help with my blood pressure levels. Sure I add unit tests, linters, and other things at my disposal to the code and have those run via CI, but for the actual pipeline in use, that is notably absent and relies on manual checks.

This causes me a reasonable amount of stress, and I need to address that somehow. For instance, both the LIFE and STAR pipelines I've been working on share some common code where the method overlaps: although both pipelines evaluate different aspects of impacts on biodiversity, they both build upon a common technique, an Area Of Habitat calculation, and so that code has been pulled out into a shared package, which in turn relies on Yirgacheffe, the declarative GIS library I use for just about every project I work on. At any one time I'll typically be focussed on either LIFE or STAR when updating the AOH library, and so there's a risk I'll break the other pipeline when I do so.

Unfortunately unit tests only get you so far, and the pipelines are big day long tasks that generate terabytes of data, so whenever I context switch from one project to another, I have this worry about have I broken it? And because I'm not an ecologist, how would I know? We generate these high resolution output rasters that I can glance at, but I'm not a domain expert, so if a pixel switches from one value to another I'm either unlikely to notice or to understand if it's significant. If it all goes blank or we miss a continent then sure, I'll notice, but these pipelines aggregate data so highly it's hard to spot if say a single species happened to go missing out of the 30000 we process.

To try and improve this situation, I've been looking at ways we can automate understanding of the data-pipeline results. The starting point for this has been the Dahal et al AOH validation method. This paper proposes two quality checks for AOH maps, one being a statistical analysis over a corpus of AOHs, and the other using occurrence data from GBIF to check individual maps.

Whilst it appears on the surface that this might be the thing I'm looking for to help automate away needless worry, there's two caveats on the claims of the paper itself:

The method only flags outliers and recommends review by human, it doesn't give is a right versus wrong answer
The paper only uses the method to compare the results between two runs with a variable change in the process: that is to say whether you got more or fewer outliers than last time

On top of the caveats placed on it by the paper, I had some other concerns with the method:

The paper only studied the suitability for birds and mammals, and we process more taxa than that in both the LIFE and STAR pipelines. Does the statistical analysis carry over? What will happen when we add very different species like plants in the future?
When I last ran this method I pulled down nearly a billion occurrence records from GBIF, and those numbers will only get worse over time. This makes it somewhat unwieldy as an automated check every time I update the code.
The stats side is based on comparison of all AOHs in the set generated, which means if we change the species included then the results are no longer comparible. This is important as year on year the species that get into the two pipelines might change as the species assessments change.

Having discussed this with both Chess Ridley on the STAR side and Alison Eyres on the LIFE side, and knowing that we had at least one of the original authors, Stuart Butchart of Birdlife International, in the Cambridge Conservation Initiative of which Alison and I are both part of, I managed to arrange a meeting for us all to discuss it as a group. So this Monday the above, along with Paul Donald (also an original author on Dahal et al and also at Birdlife International), Shane Weisz (a first year PhD student in the computer lab), and Tom Starnes from the IUCN had an hour long discussion covering both why the validation method does what it does, and ways in which it might be improved.

The key takeaways for me were:

In general occurrence checking is considered better than the statistical model checking, but at the time of the original paper they were limited on how much occurrence data there was, so needed model checking to get better coverage. At the time the habitat map they were using had been based on existing GBIF data, and avoid testing with the same data that was used to build the inputs to the AOHs they had to restrict to occurrences after that paper was published, which at the time was quite recent. On one hand, that is much less of a problem today, but my personal instinct is that based on discussions with people trying to make plant AOHs it might still be relevant, as there are a lot of plant species with very few occurrence records.
For model checking we should ideally calculate all possible AOHs for the statistical model checking, not just the ones required for the specific biodiversity metrics we're processing. This would address the churn concerns as species are re-assessed, at least for taxa that are complete in the IUCN Redlist like birds.
I was concerned the species variables used in the statistical model checking might be better suited to mammals and birds (e.g., elevation preference), but it was delt that the variables picked are broadly applicable. That said I'd still like to do an analysis given how much more data we have today for the other taxa to see if that does hold.
Paul suggested a couple of ways we might reduce the number occurrences required to get a reasonable confidence of prevelance value by repeated sampling that I think the group felt is worthy of pursing, as this might unlock a much lighter weight version of the occurrence checking, meaning more checks are run more regularly.

Overall we left the meeting with a better understanding of what the Dahal et al method is trying to achieve, and some ways in which we can look to incrementally improve on it to make it easier to run and more robust for our pipelines. Hopefully as a loose group of interested people we can find time to chip away at these things in the coming year.

If I've not stressed it enough, things like this are super important, and not just for the evaluation of existing pipelines, but also for knowing if new work we do is any good. One of the reasons I've been dragging my heals on exciting possible avenues of work this year like playing with the TESSERA machine learning model to build new/better habitat maps is because I have no way to evaluate whether they are indeed better. I think trying to shore up either my own understanding/confidence in Dahal et al or make improvements to how it can be applied more readily will help me feel I can tackle such projects without generating things that look pretty but are ultimately of no value.

STAR validation testing

I did one of the follow on tasks from the validation meeting, as it was a suitably low hanging fruit. STAR assesses the impact of specific threat categories in the Redlist, and so only generates AOHs for species with actual active threats, and it also only processes species that are marked as at higher risk. This reduces the number of eligible species in the IUCN Redlist from around 34 thousand to around nine thousand, and traditionally it is those nine thousand that we run validation on. So, as a small tweak I updated my STAR pipeline to generate the AOHs for all 34 thousand species and then just process the nine thousand of interest for the later stages of the pipeline, but use all of them in model checking.

My instinct was that this might reduce the number of AOHs marked as outliers requiring inspection, as there would be more similar looking AOHs generated. But it turned out my instinct was incorrect, and it actually increased the number of outliers within the set of species considered for STAR:

Outliers just using STAR eligible species as an input to model validation: 14
Outliers using all AOHs as an input to model validation in total: 75
Outliers of the STAR eligible subset when calculated using all AOHs: 23

The numbers are still relatively small, but it is an increase of over 50% of outliers. Given that AOHs are relatively cheap to generate, I feel that this signifies we should make this switch on both pipelines to generate all AOHs for the validation stage. Yes it'll take the AOH generation for STAR to be 2 to 3 times longer, but we're talking an hour or so currently so up to a few hours, and if that helps build confidence in the results, then I think it's worth it. I'll see what the others thing in the coming week.

Chat with Jovana on foundation models and habitat maps

As mentioned above and previously in this blog, I've been hoping to look at using the TESSERA foundational model to build potentially better habitat maps as an input to LIFE and STAR, but have been both too busy generally and wanting to prioritise how I'd evaluate any results. However, I had a chat with Jovana Knezevic this week who is looking into TESSERA and habitat maps as part of her PhD work, and her current project is comparing TESSERA to the Jung habitat map that we use for the LIFE AOHs, and so I was keen to see if I could support this effort, as Jovana understands the foundational model work much better than I do, and I have a bunch of experience in AOHs she currently lacks, and thus there's a potential collaboration here.

Jovanna's approach to this is different from the one I'd planned on taking, which would have looked more like the way Lumbierres et al approached generating their habitat crosswalk. Instead Jovana is much more interesting in exploring the latent spaces in TESSERA to see how they align with maps like Jung's. However, given before the conversation I had no idea what latent spaces are, I feel this is exactly why I should find a way to work with Jovana on this if I can! I'm generally not interested in machine learning as I've not before worked in a domain where it has been so obviously well suited, and now I do have a use case for it I have a lot of learning to do.

Deep dive on map upscaling/downscaling

In what was been possibly the most enjoyable workday I've had in a while, I spent a day last week trying to refine a 10M land cover map down to a 1M version by mixing it with other data sources and just working it out as I went, with then occasional reference to literature as I got stuck: I find I learn best by doing and failing and then learning, if I just start by reading papers or books they never stick as well.

There's a few motivations for this, one of which is fun and I'll share in due course, but the main work one is I've been looking at various bits of work, including some I've helped with, where we're trying to mix data at different resolutions to create hybrid maps, and I've been unsatisfied with the results. So this is going back to basics to try and understand what is people do in this space in general, and then try applying that back to the specific ecology problems we've been discussing in our groups.

One important thing to get out the way the terminology here confuses me, as different domains use opposite terms for the process of taking low resolution data and making it higher resolution. In video games and consumer electronics like TVs this is known as "upscaling", I guess as you're making the number of pixels go up. However, in geospatial/GIS domains it is called "downscaling", I guess as the size of each pixel is smaller than before. This realisation, which only game to me as I started to explore the literature in the domain explains some of the confused looks I've got in the past from my ecology colleagues.

For this exploration the particular map I've been playing with is just a forest/river area of Sweden I'm reasonably familiar with, and is of no direct significance to any of the projects I've been working on, but makes it easier for me to evaluate success because I know what I'm looking at. Northern Sweden is also a good test case as they have very good open data maps including a 1 metre resolution digital elevation map (DEM) and 10 metre resolution land cover map (a land cover map has whether a pixel is water, forest, road, etc.). The final reason for this being a good test area is that whilst thsi area has mixed water and land and human impact, it's still a relatively simple landscape.

Here is the land cover map area I'm going to be working with, which is about 12km by 12km. The blue is the river, light greens different types of forest, dark greens different wetlands, and grey bits man made.

My goal was to see if I could generate a downscaled 1M version of this based on combining the 10M version I downloaded with the 1M elevation dataset.

After one day of trying and learning I managed to make some progress towards that goal, and more importantly I understand the domain better now, thanks to a survey papers like Downscaling in remote sensing by Peter M. Atkinson. The kind of approach I'm taking whereby I coming multiple mixed resolution sources is referred to as "Dasymetric Mapping", which is first described in Generating Surface Models of Population Using Dasymetric Mapping by Jeremy Menning. In that paper they're trying to refine down census results that are collected at say county or ward level down to something more accurate based on other mapping. The paper describes the term as

Dasymetric mapping may be deﬁned as a kind of areal interpolation that uses ancillary (additional and related) data to aid in the areal interpolation process.

Here is a zoomed in section of the above 10M land cover map so you can see the actual pixels:

The other data I have is the elevation map, which is at a more detailed 1 metre resolution:

The first observation is that in the DEM, the large bodies of water do have a very uniform elevation value, and so my guess is that I can refine the water edges to 1m by taking the heigh value from the DEM of any pixel that aligns with a water pixel in the land class map, and looking for the obvious peaks. In Yirgacheffe this is now relatively easy:

with (
    yg.read_raster(dem_path) as dem,
    yg.read_raster(lcc_path) as lcc,
):
    # Give me just the land cover classes that should be in the river areas
    water = lcc.isin([
        6, 61, 62, # water
        2, # wetlands
    ])
    # now use that to mask the elevation map
    dem_water = water * dem
    # get the unique values and their counts
    elevations, counts = dem_water.unique(return_counts=True)

And indeed the data has some obvious peaks (a hat tip to this post by Jon Ludlum for reminding me that I could embed CSVs in my blog posts!):

Then if I just filter the DEM for those peaks, I get a nice idea of where the large bits of water are:

It's not perfect, you can see there's an area to the left that I lost, but it's a great start.

The other challenge though is that whilst I've straightened our the coastline, this means there will be gaps along the coast where the 10M water pixels used to overlap the banks of the river in the elevation map. These were "water" before, but what are they now? For this I used scipy to do a sort of convolution matrix approach to find the most frequent non-water surrounding pixel for each edge and use that to fill in the gaps.

Now finally if I use my new water layer and fill in the gaps I have my updated land cover map that shows that most of the shoreline is now at a 1M resolution where as before it was 10m:

What was a pleasant surprise was that the smaller areas of wetlands that I thought would vanish remained, because although the elevation map is at a 1M resolution horizontally, the vertical resolution is finer, and so those wetlands appeared 10cm higher than the surrounding water, which wasn't visible to my eye in QGIS.

There's still a lot to refine here. The sharp eyed might notice the road that was in the top left of this map has gone missing, as I also removed and filled in that using the more frequent nearest neighbour approach, as my plan is to try fill that in from a vector layer rather than try to refine the 10m road pixels using image filtering. I also want to smooth the transitions between the differrent forest and wetland boundaries. But still, for a day of exploratory coding, I'm pretty pleased with how it turned out, and I learned a bunch in the process.

This week

I have a follow on meeting from the STAR workshop a couple of weeks ago that I need to pull some data together for.
I have a LIFE meeting on habitat maps on Jan 7th I need to prepare for, which feels like along time away, but is week on wednesday if you allow for holidays :/
I'd like to do some more on the hybrid maps for fun as a wind down towards xmas if time allows
I reaslied that I might be able to improve how Yirgacheffe handles mismatched data scales, but I need to do some thinking about that.
Year notes. I did have some yearly objectives I gave myself for the year, and I mostly failed at those, so I'd like to write that up and try set some high level goals for next year again.