Weeknotes: 16th June 2025
Previous week (at work)
I was on vacation last week driving around the Netherlands on my motorbike with my partner, so this is mostly what I did the week before that. The Netherlands was lovely, and their cycle biased road system puts Cambridge to shame: most the places we stayed had push-bikes for us to use and I felt more safe cycling there in atypical-to-me roadways than I did cycling to meet Anil this morning on home turf.
We also had fun learning about the land management in the Netherlands: we walked on the sand motor, an artificial sandbank that is an experiment in reinforcing the coast; we drove over Afsluitdijk, a 20 mile long dyke that separates the open sea from the Ijsselmeer lake; we visited the museum at Biesbosch National Park where we learned about what happens if you don't maintain your wetlands infrastructure; and rounded it off with a guided tour of Maeslantkering, a huge set of swinging doors at Hoek van Holland for blocking the sea from reaching Rotterdam if the sea level looks like it'll swell too high, which are sufficiently large that if it wasn't for the special glass-impregnated white paint would expand 70cm in sun (as it is, the paint limits that to "just" 30 cm). Again, we do water management in East Anglia, but it's just at another scale in The Netherlands (I guess important when the centre of the country is six metres before sea level).
Area of Habitat Edge Effects
I spent some time trying to get my head around how to implement edge effects for Area of Habitat maps as part of LIFE. Edge effects refer to the fact that species that occupy certain habitats will sometimes not actually exist all the way to the edge of that habitat: if you have a habitat a species likes surrounded by habitat(s) it doesn't like, you can effectively shrink in the habitat by a set amount to allow for where they will not venture, making the population more concentrated within the inner region, and if areas a habitat are sufficiently small then species may not live there at all, despite it being a type they prefer.
Edges are quite impactful in terms of land use change, as I tried to illustrate in this picture:
- This just shows the edge on the area of habitat. The total splodge is the suitable habitat area, and the core is where the species will choose to live, avoiding the area marked edge.
- We may then think that if we change the land use of an area in the edge we don't impact the species...
- But in fact we just create a large edge that eats into the core area by an amount larger than just the area changed.
- Similarly for changing an area in the middle of the core
- The actual impact is amplified as there is an edge buffer all around the changed area, making it more impactful.
Taking into account on how to use this though is subtle I think: on one hand you do want to account for the edge effect when working out the area that can support a species, but if you're looking to monitor the area where any changes could impact that population you need to use the entire habitable area, as even changes in the edge zone will impact the habitable core zone. My job now is to follow that through for the biodiversity metric pipelines I have and ensure I use the appropriate version of AoH at each step, which might mean I need to calculate both for each species.
Data pipeline tools
I made a start on writing up the discussion session I ran at the Nordic-RSE conference, but got sucked into trying to understand the detail of DVC and Snakemake, both of which had strong advocates in the session. The current tl;dr is I like the idea of DVC and how it ties code and data together, but it lacks the ability to do detailed dependancy analysis that I'd want from a build system, and Snakemake has that level of detail, but has a much poorer user experience (subjective, I appreciate).
My secondary motivation here is that right now for both the LIFE and STAR pipelines I've written, the best sharable way to run them is via a shell script. Both were developed using Shark, our own experimental data pipelining tool, but that's a bit too experimental for me to expect others to run, so I fell back on the shell script solution, but that does a bad job of only rebuilding the necessary parts of the pipeline if any of the inputs update, for that I want a proper build system, so I'm hoping that something from this exploration will give me another way out.
Outreachy
Outreachy kicked off, with Shreya Pawaskar joining to help with Claudius. This lead to me doing a bit of work to tidy up a few loose ends that I'd been putting off that I didn't want Shreya to have to deal with, but it was great to have them around to review my PRs for that!
It's also forced me to be pragmatic, and work around a problem I have with OCaml's build system dune. None of my work is in Opam yet, as I don't feel it's met the quality bar required in terms of documentation etc., and so if people want to use libraries I've built then on the guidance of others, I point people at using dependancy pinning whereby you can specify a github repository for a dependancy in your project's dune file, and then you run dune pkg lock
and it'll fetch the pinned dependancies directly for you.
This works fine, unless you have a submodule in your project. Claudius does use submodules for certain non-code resources, like the default font that is used for rendering text. Although this could be added as a subtree, my non-humble opinion is that a submodule is more appropriate here, as we don't care about the font's history, or indeed tracking updates. But dune pkg lock
does not cause submodules to be fetched, and so currently Claudius breaks if you try to add it as a pinned dependancy. The ticket for this on dune has sat for a while now, and given that Claudius is yet to gain the fame and attention it deserves, I suspect my complaints won't move the needle there. Thus I'm going to have to add my resources as subtrees and accept the history pollution this will cause - but it's a lot better than not having Claudius usable at all.
Summer interns
Looks like we have one undergrad interested in working on helping with 3D printing geospatial data over the summer, and I'm chatting to Tiff Ki on providing support for 3D-printing camera jigs for digitising insect collections.
This week
- Write up some research ideas that have sat in my head for a while that I'm not getting time to act on - my hope is that by at least documenting them I can justify parking some of my current tasks or encourage others to at least run with the ideas so they might have impact.
- I need to write a quick overview of the AoH methodology for inclusion in some guidelines that the IUCN are pulling together.
- More on trying to write up my Nordic-RSE session on data pipelines.
- COVID booster jab - apologies in advance if I'm whinging on Friday :)