Weeknotes: 5th May 2025

Last Week

OCaml GeoTIFF progress

I made some good progress on building on Patrick's and George's work with the OCaml GeoTIFF library:

  • I added reading of compressed LZW data
  • Added support for more pixel formats
  • Added support for reading from different planes within a file
  • Added some unittests

That last one turned out to cause some trouble, and I'm grateful to Patrick for his help by fixing things. Whilst they ran locally, the tests were failing in CI, apparently as both Ounit2's test runner and EIO which I was using to get data for the tests were using fork, and double forking is often a recipe for trouble.

Patrick and I also had some discussion on issues around performance if you're not using EIO: the TIFF library's interface for reading data is based on Cstruct, which I assume is to align with what EIO uses, but if you're not an EIO user, and indeed you're coming from a "new-to-ocaml" world, then you'll be looking to load data with In_channel, which presents a problem then, as the best you can do via In_channel is load the data into a bytes value and then copy it to a Cstruct value and then have the TIFF library consume it. Patrict kindly spent some time to come up with a more direct interface for those not using EIO.

This was nice, as although I was using EIO for the unittests, for manual testing I was hooking up the library to a simple Claudius-based visualiser I have for geo-data, making it work with GeoTIFFs and that's not using EIO or such yet, and so Patrick's fix made loading data for this a lot nippier:

Here I'm visualising one of the elevation maps we use in the LIFE pipeline. The tool I'm using is not really that usable yet, but it's a slow burn project to let me load 3D data in actual 3D: it does load GeoJSON and CSV data already, and now with GeoTIFF perhaps it'll be almost useful enough I'll start to put some effort into it. It clearly isn't a high quality rendering, but a quick visualisation like this is great for telling me that I'm extracting not just the image data but also the right geospatial data with TIFF, and in future it'll be a useful sanity-check tool for the pipelines I work on.

LIFE

I generated some new scenario versions of LIFE as needed by Ali for some investigations she was doing into how to present the LIFE metric. It does lead me to think we need a guide as to not just how to run LIFE but how to alter it to make certain experiments. Ali has already started on a methodology guide, perhaps we also need a method guide (and a hat tip to Tom Swinfield for educating me recently to the difference between those two terms). The downside of this is its just yet another thing to do and we're all quite busy.

STAR

Simon Tarr has finally tried running my STAR implementation, which is great news. Inevitably, as the first person who isn't me to try run it he hit some issues, but we can hopefully now just play the game where I fix a thing and he runs it until we hit the next issue.

The one big thing that he hit, not having a compute server as big as the one I tend to use, is that for a bunch of the base layers that we need to resize/reproject but don't change over time and aren't a variable in the STAR method, they are super slow to calculate - which you do once and never again. To save Simon some time, after he demonstrated they started running, I just uploaded all the results to our shared cloud storage, as they're not that big. I think in general though we should push them to Zenodo, so that others can skip this stage also.

Anyway, great news that we've started this, and Simon and I plan to sit down together in the DAB this coming week to try get through the rest of the issues.

Den Stora Älgvandringen är över

This year's Great Moose Migration has come to a close, with 70 meese swimming over the river at the area near the cameras as they migrate north. It was an interesting one, as spring was very early this year, so they had to start the stream a week early, as the ice had already melted and meese were starting to be in the area. Indeed, most swam within that first week or so, and very few in the final week. This was the opposite of 2023, when spring was very late, and on the date of the official close no meese had swam, so they had to extend it a week the other way.

It was a fun few weeks, and I have a plan for a geospatial related hack for next year's event, so hopefully I'll find a little time for that in the later half of the year.

This Week

OCaml GeoTIFF

On the OCaml GeoTIFF side of things, writing data is the next big thing to tackle if this is to be a usable tool, and TIFF is not a great format from that perspective, as its flexibility leads to a bunch of challenges whereby the file itself can suffer internal fragmentation. TIFF data is stored in strips held in a dictionary, which is fine if your data is uncompressed and the length of those strips is a constant, but if your data is compressed, then the length of those strips can change depending on the data, so if you modify data in an existing image then the strip can shrink, leaving dead space in the middle of the file, or you won't have enough room, so you'll need to relocate the strip to the end of the file and now you have even more dead space in the middle of the file. You can compact the file, but on a 150GB file that's a lot of data churn if you modify the first strip...

STAR and LIFE

Specific things:

  • Sit down with Simon and get him running my STAR code.
  • We have another LIFE meeting around future work, and for once I think I've done all my action items for this one!

On a more general note though, for both I need to complete the Dahal et al validation method, which requires using occurrence data from GBIF. We've been mirroring GBIF locally, so I need to work with Anil to get access to that so I can start using it.

Tags: yirgacheffe, ocaml, life