Weeknotes: 30th May 2023

The week in review

TMF Methodology

Pairing up with Patrick, I’ve started on implementing the TMF Methodology 1.1. We’re working on this from different ends, with Patrick working on the permanence calculation, and I’m looking at wrangling the data input side. I was able to get the project polygons out of GEE, which I have covered to a GPKG as I prefer that to geojson for working with - being able to load a GPKG into DB Browser for SQLite and see the structure and run queries on it is quite helpful. I suspect if I was a proper geospatial person I’d use QGIS for this, but it turns out I’m a database person (much to my surprise).

From this I’ve been able to get the project boundaries, and then using GDAL do things like add boundaries:

To do this I did a little tweaking of the Yirgacheffe APIs to make it easier to create layers from geometries.

The thing I need to enhance here is that I’m projecting the boundaries in WSG84, so they’re not uniform in each axis. Thanks to guidance from Tom and Amelia I know I need to project these into UTM, which gives me a locally uniform projection, after which I can then re-project back to WS84.

I’ve also been looking at getting the GEDI data needed for accessing the Above Ground Biomass (AGB) of the project area. For that I’ve started trying to get Amelia’s GEDI download code working, but I’ve hit a couple of hiccups as I try to get it running locally - there’s a couple of assumptions in the setup that works on sherwood but not otherwhere, so I’ll fix those and contribute them back to Amelia.

Patrick and I sat down and did a pass of ensuring we knew where all the inputs linked with the algorithm and found more things that needed pinning down in the methodology document. We got about half way through with this, and we’ll try finish that pass early this week.

Ecology calculations performance issues no 1

After trying to juggle Tom B’s jobs between our various compute servers, in the end I just accepted my lot and wrote a replacement for gdalwarp for him - seems that you can disrupt my work schedule and get me to work for you if you just try to destabilise the machines I am guardian for :)

I’ve no idea what gdalwarp was doing to be so inefficient, but Tom was just downscaling some GeoTIFFs, and my minimal re-write seemed to speed them up by between one and two orders of magnitude without trying to be clever. Despite the performance win, and despite me not loading more than one row of pixels at a time, Tom reported that memory usage was still quite high, but that wasn’t the case when I ran the same code locally. I pondered this, and then recalled that for the AoH code I had to explicitly tell GDAL not to cache chunks of data, and adding this again to the downsample script I made for Tom fixed it. It seems that each GDAL instance just assumes it can use most of physical memory as a cache if it wants by default.

Anyway, the net result was Tom was able to wrap up his work quickly enough after I made the script, so yay.

Ecology calculations performance issues no 2

As a Friday afternoon problem I started to have a look at why the H3 calculations seem to be so slow for Alison. Originally she reported they were slow, and that lead us to discover that on the computer server she was using the network access was significantly broken, so I thought I’d dig into why she still seems to be having issues.

It occurred to me that one reason might be that the GeoTIFFs that we generate as part of the AoH calc stage were all compressed to save disk space - not that we’re short on disk space, but that’s just what the original code did before I got involved and I never questioned it. For the hex tile work I wondered if this was causing us trouble - the hex tiles are all about 13x13 pixels, and the AoH GeoTIFFs are anything up to 400k wide, which could lead to a lot of decompression being done repeatedly per tile.

I started to write a naive version of the H3 code that worked the inverse way around from the current implementation to test this, but then I realised I could just test my theory by decompressing a bunch of the AoH data and re-running the original code. This showed me that not only was my theory wrong, but in fact things ran faster with the compressed GeoTIFFs, which makes little sense to me, given that the data is very noisy - I could understand for a very uniform compressed file that things would be faster.

I though perhaps that’s just an indicator that the network is the bottleneck still, and the decompression overhead is outweighed by the slowness of accessing the NFS storage from this compute server, but copying a couple of the files to local disk also showed me that:

Compressed was still faster than uncompressed!?
Accessing the NFS store was slower than local - adding about 20% to the run time.

I wanted to re-run this on one of the other compute servers to check if it’s just this one node's network that makes NFS so slow to access, but both machines were busy on Friday, so I’ll try slip that in this week when one of them has a quiet moment. But the compression versus uncompressed thing still confuses me, so I’ll need to ponder that one some more as time allows.

This coming week

TMF methodology is still my top priority - I want to get to having combined the GEDI data and the project boundaries part of the methodology done this week.
Look into the H3 weirdness some more