Weeknotes: 13th October 2025

Last week

PROPL Talk Prep

The main focus the week was to pull together a talk for PROPL to go with my paper on Yirgacheffe that will be published there. The workshop is in Singapore and part of a large conference that lasts all week, but I'm just interested in the one workshop day mostly, so I'll be attending and presenting remotely. Given both the remote presentation and because of timezones I'm scheduled to talk at the end of the day there, Anil has suggested I give a demo. Which I thought was an excellent way to try help keep people more engaged.

With the small detail that I didn't have a demo, as mostly I use Yirgacheffe as the base layer of large ecology pipelines that take days to run. On top of that, the datasets I work with aren't particularly snappy, being in the hundreds of gigabytes. Still, given I know I need to document Yirgacheffe better, and adding examples to that, I figured this was a good challenge.

The first step was to add some visualisation support to Yirgacheffe, based on what I'd seen rasterio do, where in a Jupyter notebook you can call raster.show() and a visualisation of that raster just magically appears! Such a cool and useful feature, and so I felt time to make like the old Steve Jobs quote.

Looking under the hood of rasterio it turns out that they're suing matplotlib, which Jupyter works with natively, but you can also use just stand alone in regular Python. So I followed their lead, and after a little while was able to make some visualisations of how AOH is pulled together:

A diagram showing six maps of the Philippines arranged in a 3 by 2 grid, illustrating the step-by-step process of calculating Area of Habitat: habitat types (multicoloured classification map), filtered habitat (binary mask of suitable habitat), range (species distribution in white), elevation (gradient from blue to white showing topography), filtered elevation (binary mask of suitable elevations), and area of habitat (final result showing intersection of all criteria in white on black).

For the demo, given I've never really taken to using Python notebooks, I was going to just run code in a terminal and pop up matplotlib windows, but I think I'm hitting an issue with the newest version of macOS on my laptop where I get ghost windows blocking input, and juggling terminals and these ghost windows was getting in the way. So in the end I decided to invest a little time learning how to use Jupyter. To cut a long story short, I didn't get on well with Jupyter, and so I asked for advice on the Nordic-RSE chat channel, as I know that a few people over there have been using them to teach people coding. Over there Luca Ferranti pointed me towards Marimo, which is a more modern take on Python notebooks:

  • The notebooks are more readily editable as code and are more version control friendly.
  • The UI is reactive, so as you update one cell, dependant cells update, which makes for a good demo.
  • Overall the look is more modern, which is a silly superficial thing, except I'm giving a presentation/demo, so that is kinda important.

A screenshot of a Marimo notebook titled "Demo 1: Sentinel-2 cloud removal" showing two side-by-side satellite imagery plots: the left displays a Vegetation Red Edge 2 band with blue-green colour gradient, and the right shows a Scene Classification Layer in grayscale with white patches indicating cloud coverage.

The only fiddly bit I found was controlling the lazy vs eager evaluation in Marimo: because I want people to see that I'm running things manually, I did manage to that with lazy mode, but you can't do that if you switch to the presenter mode as far as I could tell. But still, I could do live code edits and the updates rippled through, which was a fun way to show liveness.

Overall, I'm pleased with the demos I have made, I just need to find a way to host them at some point so people can play with them (except the "full 100m data layer" one that needs 300GB of data :).

Yirgacheffe

Other than adding the show function as described above, when I was thinking of examples for the talk, I did wonder about summing rasters, or even an endemism which does more fun math with NaNs in it. Normally I end up writing something clunky like this for these examples:

list_of_rasters = ...

total = list_of_rasters[0]
for raster in giant_list_of_rasters[1:]:
    total += raster

Which is quite hokey, and I thought wouldn't go down well in a conference full of functional programmers. Really what I wanted was something more like a fold operation, only Python doesn't really have a fold. Or so I thought: it turns out for this particular program it has something reasonably close, called reduce:

>>> from functools import reduce
>>> from operator import add
>>> reduce(add, [1, 2, 3])
6

What is fun, is that because Yirgacheffe uses operator overloading for its types, the exact same code works using Yirgacheffe layers! So all I had to do to add support for a proper Pythonic way to sum up a list of layers was add a unit test :)

from functools import reduce
import operator

import numpy as np

from yirgacheffe.layers import RasterLayer
from tests.helpers import gdal_dataset_with_data

def test_add_similar_layers() -> None:
    data = [
        np.array([[1, 2, 3, 4], [5, 6, 7, 8]]),
        np.array([[10, 20, 30, 40], [50, 60, 70, 80]]),
        np.array([[100, 200, 300, 400], [500, 600, 700, 800]]),
    ]

    layers = [RasterLayer(gdal_dataset_with_data((0,0), 1.0, x)) for x in data]

    summed_layers = reduce(operator.add, layers)
    actual = summed_layers.read_array(0, 0, 4, 2)

    expected = reduce(operator.add, data)

    assert (expected == actual).all()

I felt that was a nice outcome to that particular line of thought. I didn't use this in the demo in the end, but I'm glad it caused me to learn this. I just need to now switch all those clunky raster aggregation code patterns I have in my various pipelines to use reduce at some point.

Whilst I was at it I also finally added mypy type checking support to all the unit tests in Yirgacheffe, which caught a couple of silly asserts that always passed as I had confused a property with a method, and was just testing the method exists rather then the result of the method call 🤦 A silly oversight and why mypy is such a good tool (thankfully no bugs were hidden by this, but I'm glad they're fixed for when future me gets something wrong).

Habitat maps

I did make a little progress on running the Woodman et al algorithm for integrating course data into habitat maps for the hybrid habitat map I'd made based using the Jung habitat map combined with course but more up to date agricultural data from GAEZ and HYDE. Mostly what I learned was that I had misunderstood the inputs to their LandScaleR R Package, and the results I generated weren't that good as a result. As a result I need to do some more coding on this and try it again.

Blog meta

Andrian McEwan had pointed out when I last had a paper published that there was a bogus entry in my RSS feed for said paper, due to how I failed to filter publication pages on my website out from the RSS feed. Adrian was quick to point out he didn't want it removed, he just wanted to know more about the publication, and so I've finally spent half an hour remembering how my website works, and fixing it so that both publications do have a page of their own with the abstracts on, and that page is properly in the RSS feed.

This is slightly complicated with how I generate a lot of pages like the ones for publications based on the frontmatter metadata: that is how I store the conference, authors, etc., and only the abstract is in the body of the page. Thus I needed to plumb through how pages are rendered in Webplats and then fix that in the code for this particular website.

This week

  • PROPL is on Monday, and whilst giving my talk is mostly what is on my mind, I am looking forward to watching the other presentations (just not getting up super early to do so).
  • Try to do another pass on the habitat maps work
  • Do a run of my updated STAR code
  • Try to clear my email backlog - I was a bit consumed with PROPL prep and got a bit behind on non-essential things

Tags: propl, yirgacheffe