Weeknotes: 16th October 2023

Last week

Tropical Moist Forest Evaluation Methodology Implementation

Last week most coding time was spent on the Tropical Moist Forest Evaluation Methodology Implementation (TMFEMI). The focus was around issues reported around coarsened proportional coverage (CPC) versions of the JRC TMF land usage class data.

On investigation I realised that some assumptions I’d made about the JRC datasets was not correct. Firstly, I’d assumed being a tiled dataset all the images would be the same size: this is not true, there are some images that are a little wider or taller than others, which means that my attempt to use an optimised rendering path I have for tiled layers didn’t work when I tried to make a single large image from all the CPC layers.

The second is that I’d looked at several of the tiles back when Miranda was working on the methodology also, and they all had a clear overlap zone, where we need to account for the fact that adjacent layers had several pixels on each side that spilled into the next tile. However, as we move to (0, 0) we find that’s not the case, which is another broken assumption. I’d assumed I’d have some buffer of pixels around the edge of any tile.

In the end I switched out our CPC generation to generate a single whole globe dataset once, rather than trying to make scaled tiles that match the original JRC dataset, as then these “non tiled” tile issues don’t exist any more once that raster has been made.

coarse_2022_1.jpg

When looking at the CPC layers after these changes, we still found that we didn’t match Tom’s data, and that’s when we found that whilst we both thought we were working the the 2021 JRC release data, we both had different datasets with marked differences in certain land classes. Upon investigation it looks like the 2021 dataset JRC published to Google Earth Engine (GEE) does not match the version we have downloaded from 2021. Of course, because JRC don’t provide any provenance information about the data in it, we can’t know why these datasets differ.

Thankfully the 2022 release download we have and that JRC have published to GEE do match, and so now we have moved to using this release, but it means both we and Tom have to re-run all our analysis and any old comparisons are suspect.

And finally, Tom managed to find a village in the Gola Rainforest region via CPC comparison :) The CPC layers are just averaged to a 1.2km pixel size at the equator from a 30m at the equator source. For some reason Tom’s calculations had a much lower min pixel value than the one I was generating, and when we had a look, it turns out that there was a dead zone in the LUC data that Tom was averaging into one pixel that we were not:

Screenshot 2023-10-12 at 16.58.35.png

The different is just one of offsetting: we both used a similar averaging technique, but because my code started at a slightly different lat/lng to Tom’s this dead zone was distributed over multiple pixels rather than falling in one. Out of curiosity, I wondered what was causing the gap in the data, and it turned out to be a deforested region next to a small town:

Screenshot 2023-10-16 at 10.34.37.png

It’s always nice to know where these anomalies come from. But it does point out we may need to do some filtering on outlier values, as when averaging data there’s always a risk of hitting something like this, though in this case Robin is already working on a better averaging system, so in the future this won’t be an issue for this particular dataset.


One final thing that seems to be causing confusion is that my library for processing raster datasets, Yirgacheffe, normalises pixel offsets to a virtual grid upon loading/saving. I do this because in GeoTIFF you can have any origin offset you like, but if we’re going to compare raster layers you really want the origin of all the pixel layers to align to a virtual grid defined by the pixel size. For the AoH work I forced that so that the origin of the virtual pixel space is at (0, 0) and shuffle all data to align to that.

However, when trying to compare data we generate with that Tom generates in GEE this is causing people to flag this as why we’re getting wrong results, something I dispute, but I can see why. As such I’m going to make this forced alignment optional, and just have yirgacheffe refuse to work if you haven’t asked it to align things and your datasets aren’t already aligned.

IUCN AoH

Alison and I had a chat with Simon Tarr and Kate Harding from IUCN about using our AoH pipeline to generate rasters for their start biodiversity evaluations. In general everyone was happy we should progress on this, but first we needed to solve a few technical issues thrown up the last time we tried to generate AoH rasters to match those the IUCN generated.

The first one of those, caused by the habitat type crosswalk table being incorrect, has been fixed, and now my test species raster looks close to spot on for the IUCN data:

Screenshot 2023-10-12 at 15.15.13.png

Now I need to re-run our code for a small number of other species to check they also work, and Simon is happy with those. If so Kate will generate us new input data for all the species they want to evaluate for and we can do a manual run for that.

The eventual aim is that this will be an automated/repeatable pipeline, similar to the TMF set up.

This coming week

  • TMFEMI: Sadiq and Robin did some great performance work, which is sat on branches, and that needs to be getting into the main branch, as we have too many branches built on branches right now.
  • LIFE: Ali has asked me to look into a few outlier results in the last full batch she’d like help in identifying the root cause for
  • IUCN: run more species and get those results to Simon

Tags: weeknotes, life, tmfemi, aoh