Weeknotes: 8th May 2023
The week in review
LIFE
Andrew and Alison want to do variation on the current persistence code, whereby the AoH is adjusted for whether the IUCN data says that a species habitat type matting is marked as “Suitable” or “Marginal”. Currently, the code just ignores marginal habitats, but they want to add a scaled area (e.g., Suitable gives a x1 factor, Marginal a x0.5 factor).
It was an easy enough thing to try, so I made a prototype branch for Alison. It required some changes to Daniele’s library, so I’ve made a PR upstream for the changes I made (which simplifies his code, so hopefully will be accepted at some point).
Swift Yirgacheffe and a GDAL rabbit hole
I didn’t progress this much, but I did realise that my Swift persistence calculator wasn’t factoring in the habitat types properly after the above discussion with Ali, so I added that, both to the IUCN DB model, and to my AoH calculator in Swift.
We had wondered if this was the cause of some of the discrepancy between the results I get in Swift versus Python, but it just so happened that all my test species didn’t have any marginal habitats, so I was doing the right thing by chance in those cases. The error was of the order of 0.02%, which is annoyingly high - too high to just be floating point rounding differences.
I dug into this, and in the end discovered that when I render the vector range data to a raster file, I get a different set of pixels from GDAl, in just a few points on the map.

The white dots are where the two rasters differ, and all the dots are present in the GDAL version and absent in my version. The original looked like this:

My rendering code is quite crude, I just take the vector points, project those onto a grid as polygons, and raster them. I read through the GDAL code for mastering, and in the comments it makes strong claims about how it renders each line:
/* A pixel is considered inside a polygon if its center */
/* falls inside the polygon. This is robust unless */
/* the nodes are placed in the center of the pixels in which */
/* case, due to numerical inaccuracies, it's hard to predict */
/* if the pixel will be considered inside or outside the shape. */
But when I checked the pixels that differed between my rasterisation and GDAL’s, all the pixels have a centre-point that is outside the vector, which is confusing. I then checked all the boundary points on the GDAL raster, and it seems that according to my math, quite a lot of the boundary is outside the vector. GDAL has a lot of work on it, so Occam’s Razor tells us that my code is broken some how, and after tying myself in knots for half a day I’ve backed out of this for now, but I’d like to set down with Patrick at some point and see if we can work out what’s going on here.
The other question is: does it matter? The rasterisation is an approximation anyway, so is my 0.02% different answer from GDAL significant? I’ve not seen any discussion anywhere about the trust placed in GDAL - if anyone has links to such things do let me know!
Papers
I put a bit of time into the paper Patrick’s working on. Mostly I was trying to re-structure the argument flow of the paper to tell the story better rather than do anything technical on it.
Ark runc python wrapper
As I mentioned last week, we have an issue with GDAL at the moment on our machines where the installed version of GDAL (3.4) has issues with the python library and numpy, which was fixed as of 3.5 (the latest being 3.6). Last time I updated GDAL on the machines it broke things for a bunch of people, so I’m using this as motivation to do some Ark work and build a containerised environment for the persistence work, which is actively hitting into this issue.
Whilst I have a container built already for CI, what I want is a nice wrapper that teh ecologists can invoke as easily as they do regular python, but is in fact the containerised version of the environment. Partick pointed me at runc, which is the low end tool that docker uses to run containers, and I spent some time playing with that. I’m currently part way through writing a little wrapper that’ll generate the spec needed to mount /maps and the user’s script directory in the container.
The nice side effect of all this is that it means we’re using a known container image when we generate the data, and can tag the results with the container ID/hash, as a baby step towards having the required embedded metadata in results to allow for easy reproduction.
Sync Queues vs Actors
A slightly nerdy one, but I was reading up about how actors work in Swift, as in theory it should remove a bunch of concurrency boilerplate code that I add to all my classes in Swift to make them thread safe. Whilst I like the removal of boilerplate, they feel a bit odd - surely every class should become an actor if you ever thing your code will be used in a multithreaded context? I thought I’d got my head around them, then I started reading up on GlobalActors and decided I must have missed something, so I suspect a WWDC video session is in my future.
The coming week
- Get the runc wrapper working so I can stop hand wiring pip environments for people
- Get some one to check my GDAL working
- We have a meeting on Friday about the counterfactual methodology and implementing that
Interesting links
- Mojo: you can read a summary of it here. I’m skeptical that this’ll take off, as there’s been too many promises and not enough code, but in theory this might solve a bunch of the Python issues that have driven me to Swift. So, worth watching, and if it works then we should use it, but I’m not holding my breath in the mean time.
- WildLabs: I chatted to Anil about some open source hardware things in conservation, and was reminded by a friend at the weekend of Wildlabs, which I think was set up by Cambridge person Jenny Molloy, which has been actively pushing open source hardware into the field in remote places. If we go down this route in the future it’d be someone to potentially collaborate with.
- Audiomoth - an open source audio sensor