Weeknotes: 5th June 2023

It’s not a bank holiday, which is probably a shock to us all, so here’s a picture I took a few weeks ago of an arctic fox to help you chill (finally started processing these photos over the weekend)

DSC06535.jpg

The week in review

Tropical Moist Forest Evaluation Method Implementation

This took up most of my time this last week. I managed to get to the stage where I’m downloading the GEDI shots for a project plus it’s buffer zone, and then creating a raster that is the shots that fall within the project boundaries and storing the Above Ground Biomass Density (AGBD) where there is a shot.

So, if we take an example project:

Screenshot 2023-05-19 at 16.50.21.png

I can compute the 30km boundary required by the assessment methodology:

test.tiff

And then I get the GEDI raster of the shots here:

the_gedi_raster.tif

It’s kinda hard to see, so here’s s zoomed in bit of the top right:

Screenshot 2023-06-05 at 09.35.15.png


Some observations about this that I made:

  • I had an issue with the NASA API for querying which data files I need for a given area. I see from Amelia’s code that there’s a restriction in how many points you can have when you define a region of interest, but it turns out there’s some brokenness in the Cloudflare setup whereby if you ask for a simple region that has too many GEDI passes over it, you get an HTTP 413 error, as Cloudflare thinks the response from NASA is too big 🤦🏼 So in the end I have to split up the regions I’m interested in into smaller regions before I ask for them, which you can see represented here:

Screenshot 2023-06-01 at 15.34.34.png

  • GEDI data files contain data from a single pass of the ISS over an area, but include all the data for that run in which those shots are contained, which you can see in this screenshot which has the shots for the project, but you can see all the data we had to download just to get the bit we’re interested in. In total 600K points are within the project and buffer zone, but we had to download 60M points in total to just get that. This is 40GB of data on disk, most of which isn’t of interest.

Screenshot 2023-06-01 at 16.32.25.png

  • I made some initial PRs to Amelia‘s code as I try to get it to work for this use case. I’m still bodging it though to remove spark from the code, so I need to chat to Amelia about the best way of moving the common bits somewhere else.

I also used Amelia’s code to download all the JRC data I need to start computing the AGB numbers the methodology requires. I now have much fewer free disk spaces.


Patrick and I did another pass on the methodology document, completing the end-to-end review we started last week. There are now a lot of questions for Tom to answer, but I don’t think anything blocking progress.

Patrick, Keshav and I had a chat about Google Earth Engine (GEE) dependancies, and we suspect that for an MVP we can get a version of the methodology working that has no dependancy on GEE at all, and so that will be our approach. Based on my earlier analysis, the only dataset that we needed that was locked into GEE is the Access To Healthcare dataset, which is derived from the raw data from the Malaria Atlas Project using an undisclosed algorithm. However, we suspect the raw data will likely substitute easily enough into the matching algorithm, and so we’ll start with that.

This coming week

  • Get the AGB calculations working on the TMF methodology, and start moving towards the pixel matching algorithm, ensuring the data we have is all there as I think is.
  • Chat to Amelia, assuming she’s back from vacation about both the GEDI code, and how the GEDI data lifecycle might work in our Ark world.

Tags: weeknotes, tmfemi, gedi, gee