Weeknotes: 1st May 2023

Last week

Swift persistence calculations

Predictably, after thinking I finished the basic version of this last week I then hit a few more stumbles this week which I had to iron out.

Last week I had things working using the ESA CCI data (I think - I need to check with Alison), which is a lower resolution than the Jung based datasets we use for actual climate calculations - I do this mostly because the Jung base maps are 150 GB each, and my laptop would struggle to home all that.

Anyway, when did move to Jung data I immediately hit issues with Cairo, the 2D graphics library we use to render the vector ranges to rasters in the calculations. It turns out that internally they use a 32 bit packed signed x, y coordinate space, which means you can’t have more than +/- 32K on any axis, and so the the largest bitmap you can instantiate is 32K in either axis. Given the Jung map is 400k by 200k, this is somewhat limiting.

In the end I worked around this by adding chunking internally, which I probably would have done eventually, but hadn’t done so yet, rather I was just doing banding as that aligns with how the TIFF format stores the data.

But that and a couple of other oddities needed to be squashed.

Performance wise I’m now starting to get to the point where I have the same single threaded code running on one of our AMD EPYC compute servers with both the Python and Swift implementations, and the results I get seem to be quite variable, so I need to do a repeated bulk run. It is somewhat interesting that the Swift version runs notably faster on my MacBook Pro than it does on the AMD machine, whereas the Python code is predictably a little slower in the same scenario. As Anil points out, this is probably due to how the swift run time interacts with Linux versus Darwin (the macOS kernel). Still, Swift is faster than Python in both cases, so it doesn’t negate what I’m trying to show longer term.

In terms of longer term, I did spot that Apple has a vector math library as part of the Accelerate framework, which I believe is for CPU based acceleration of numerical work (versus using the GPU). At some point it might be fun to try drop that in versus GPU to see how things go. At some point we can give Anil’s Mac Studio something to crunch ;) Alas it’s not available on the Linux Swift port.

I put in a proposal on this to SwiftConf at the last minute, as I decided I fancied doing a tech talk about the work somewhere. Given it was a last minute throw I doubt it’ll get in, but it’s a good reminder for me to look for other venues I could engage with - it’s been a while since I did tech conferences regularly, and it’d be nice to get back into them as I find them a useful source of inspiration.

Var bor svenska älgar?

On practical upshot of all this work is that now that I know the formats used for data a lot better, I can now start to build my own maps. For fun, given I’ve been watching The Great Moose Migration this week I did a map of where meese live in Sweden, but using some of the AoH data we have and pulling in country boundaries I found here, and then converted from a shp file to a gpkg file I could use with Yirgacheffe using this command:

$ ogr2ogr -f GPKG countries.gpkg countries.shp

Which gave me:

This is the second version - the first version I’d used the wrong base map, so needed Ali’s help there - thanks Alison!

Ark vs Methodology

File this one under probably obvious to others, but not to Michael. One of the tasks I had to do when switching from ESA CCI data to Jung data was as I mentioned last week, update the code for mapping habitats between the respective map’s encoding and the IUCN encoding. When doing so there isn’t a simple one-to-one mapping, as the IUCN dataset is more nuanced than what is in either GeoTIFF habitat, so it’s not just a format conversation, but also some decisions are made about how to change the data. In Daniele’s Python code, which I was going from, he’s got nice docs about what his code does, but not about the decisions made pick that particular algorithm.

This links nicely to the methodology document that Keshav’s produced for the project assessment, which in that instances does explain all the algorithms used and justifies them.

In Ark we have this idea that all outputs will somehow encode all inputs - hash of commits, hash of data sources, links to where they came from etc. This has made it clear to me that if you want results to be verifiable you also need to provide links to the design document and version.

HotCarbon

We considered submitting the rejected HotOS paper to HotCarbon, but we decided not to in the end, and will most likely aim a respin of this at HotNets at some point. Patrick’s HotCarbon paper is nearly done, and I joined Anil and Patrick in a review party last week after giving some feedback earlier in the week.

Pip horribleness

I was reminded of why I wanted to sort out a container environment solution for the persistence pipeline, as we seem to have a dependancy mess between our compute servers, GDAL, and pip. Our servers, which run Ubuntu LTS, are running a GDAL slightly newer than Ubuntu LTS supports, but when I installed the GDAL update it caused disruption to people as various tools didn’t automatically pickup the new library version and needed to be updated manually causing disruption to their plans, so I’m loathed to repeat that.

However, the python world moves forward, and even though we can pin pip libraries at the right version for the version of GDAL we have installed, there are implicit dependancies that move forward that mean things are fragile. To make matters worse it seems that over time pip has had two versions of the Python GDAL bindings we need, and the older one works fine, but the newer one does.

All of which meant I had to copy my working pip cache into Alison’s pip cache this week just so she could run things on our compute server. And people wonder why I dislike software engineering despite having a degree in it (pop fact, I have a undergrad degree in Software Engineering, not Computer Science).

The correct solution for this is either containers or VMs. Whilst we can run docker rootless on machines, it’s high friction, and some containers, such as postgres/postgis, will not work, as they assume that you can change permissions of files on disk to some specific UIDs for instance. Similarly, this also stops us using Congo as a backing store for Docker things in general, as you can’t change ownership on Congo. But I do suspect this will be the path of least resistance, and I can just tweak the docker file for Postgres to stop it being so needy (as a specific example - we’ll need to see what tweaks are needed on a case by case basis).

I guess I now think that having people have actual accounts on our compute servers just doesn’t scale due to how tools work these days, and that you want some sort of container service layer built on top. At Bromium we had a thing called “VMaaS” or “VMs as a Service” which would spin up VMs for testing builds of our software on demand - you’d go to a web page, select what OS version and build you wanted, and it’d spin up a VM (either immediately, or when there was next capacity), and away you go. I kinda want that for our scientists using our compute servers, but I also am not offering to write that :)

The week coming

Help Patrick with his HotCarbon paper
Help Alison with some coding questions she has
Chat to Amelia about the possible HotNets submission
Ponder/make performance experiments
Start to look at the TMF project assessment methodology with an eye to implementation

Tags: weeknotes, swift, python

Tech notes by Michael Winston Dales