Weeknotes: 25th August 2025
Last week
Yirgacheffe
A number of small improvements to Yirgacheffe this week, including some that are not from me (thanks Dan!):
- I fixed a bug in how I'd handled NODATA in GroupLayers if there's no overlapping data; all my test cases had overlapping tiles alas, so I'd failed to spot I'd used the wrong call on numpy, which Finley then spotted.
- Some CI tweaks to bring things more up to date by Dan and one by me to stop publishing to pypi with every merge to main: that works fine if it's just me working on the project, but doesn't scale to more contributors.
- More internal code shuffling ready for the 2.0 release so I can have better autodoc docs. I'm running out of things I can do without breaking the 1.x APIs, so perhaps I need to actually do the 2.0 branch now.
- Expanded the 2.0 API style helpers so I could port my AOH library to the new simpler API
- Bought, but not set up yet, a domain to host the docs that I want to generate with 2.0
I have to confess I might take a break before the push to 2.0, as after a couple of weeks of house keeping I was starting to feel a little unmotivated. I'd like to get 2.0 done ready for when the PROPL paper drops, which is a month and a bit, but I can probably focus on other things this week.
PROPL paper
I addressed most of the feedback on the paper I submitted to the PROPL workshop on Yirgacheffe. I passed it over to Anil for a final pass before we upload the camera ready.
Claudius
Cladius, the retro-style graphics library I made for OCaml, is now, thanks to Outreachy intern Shreya Pawaskar, now available via opam. It was suggested that we shout about it to the OCaml forums, but before I do that I want to get the odoc docs on the Claudius domain (again, bought but not set up), and I spent a tedious but necessary few hours tidying up the examples so that they're a little easier to navigate.
LIFE
I made a start of how I might implement fractional edge-effects for AOHs, but that is still at the pencil and paper stage.
TESSERA
I had my first attempt at using the Tessera foundation model, which takes high resolution (10 meter) satellite data and has built a 128-dimension model of the earth from that, which I want to use for some ideas I've had around habitat maps. However, after I downloaded some coastal tiles using the new Python bindings I found the coast data suffered some similar clipping issues I've seen on other datasets, which I filed as a bug. I've written before about a similar problem I have with some of the IUCN range data - so Tessera is at least in good company :)
It's an easy, and common, mistake (in my opinion) to try clip datasets to the coast. Logically it kinda makes sense if you only care about terrestrial OR marine habitats or species or whatever, you should mask out the other domain. However:
- Coast lines are very detailed, so to do it well requires very complex polygon regions that get very large, very quickly. The classic example is in my STAR pipeline processing the Sooty Shearwater takes almost as long as processing all the other birds, on account of its highly detailed costal range.
- To avoid you might be tempted to use lower resolution polygons with fewer points that then clip into the other domain when you don't want to (which is what I suspect is happening here). Ultimately even the high resolution ones will suffer this if you zoom in enough, but what is "good enough" depends on the accuracy of you other datasets.
- No two data sets an agree where the coast line is, so actually aligning your mask to your data is a fools errand anyway. Given the spatial resolution of the source satellite data for Tessera is 10m per pixel, that's finer granularity than the difference between high and low tide on many beaches, so just what do you define as land and sea at that point? Along a single coastline you'll probably find different tide points in the same raster dataset depending on when the individual images were captured.
Ultimately, in both this case and in things like the IUCN range data, I wish people would just put a buffer around coastal lines and allow the specific underlying datasets (the satellite data in Tessera's case, or the habitat maps for AOHs when using IUCN range data) to define the coast line. A buffer would remove omission errors and be computationally a lot nicer to work with.
I'm still keen to make use of Tessera as I think it's one of the rare things at this present moment where I think a machine learning model is the right tool for the job rather than being a fashion item, but given how important costal regions are working with species data, I need to get this issue resolved before I do too much on this front.
This week
- I have more meetings than I'd like this week, but that's what I get for working remotely up north for the last fortnight.
- Try implementing my edge-effects ideas
- Make sure the PROPL camera ready is submitted
Tags: yirgacheffe, propl, claudius, tessera