Weeknotes: 3rd March 2025

Last week

Using Go with Wasm and Web Workers

As indicated last week, I spent some time learning how to use both Go for Web Assembly and with Web Workers for parallelism in the browser. I wrote a long post about the details if that's of any interest.

Webplats

To make the above post I had to actually do tweaks to Webplats, the software I wrote that hosts this site. The technical details aren't that exciting, but it's just nice to observe how liberating it is to be able to hit a limitation of the hosting tool you're using and be able to quickly change it to remove that limitation.

I did however also have to disable code syntax highlighting as hilite (or possibly one the libraries it uses to tokenise the code) is struggling with Go, and no syntax-highlighting seemed a better fallback that confusing syntax-highlighting. This is the flipside of the liberations of course :)

ZFS vs free memory

I spent a bunch of time chasing what was mysteriously consuming all the RAM on one of our compute servers. For the last couple of weeks the "used" memory was reported as being very high despite there being no processes to which that memory was attributed. We had a bunch of suspicions, but in the end it appears to be ZFS: the caches ZFS use are being attributed to "used" rather than "buffers/caches" in most reports on Linux. I can confirm this via:

$ cat /proc/spl/kstat/zfs/arcstats | grep data_size
data_size                       4    579687081472

Yes, that's 580 GB of cache in RAM :)

I've no idea why we're getting this now, rather than in the past, as this seems to have been the behaviour of ZFS ARC for a long time based on my stackoverflow observations, and it makes it quite hard to work out what is actually usable on a shared machine now.

LIFE

Alison and I have been working away at an update to the LIFE maps, fixing some issues with the species filtering in particular. For example, in the original data some bird species were filtered out early as GDAL silently ignored any range vectors that had the type "surface" rather than "polygon". Had there been an error we would have noticed and fixed them, but alas not.

After much checking things this last couple of months, we're ready to put up a new set of layers on Zenodo.

iNaturalist data

For a lot of what I work on we use the IUCN Red List for species range data. The IUCN range data is generated via expert assessment and is considered generally the best source of data to work with. However, like anything it does have some limitations: some are technical, e.g., there's always more species to find and assess; and some are non-technical, which is where licensing comes into play.

Whilst data from the IUCN Red List is open to access, to publish derived works you need written permission from the IUCN, which is quite the restriction in terms of smaller projects. Whilst I understand the line the IUCN needs to walk in terms of it actually is taking data from many sources and it wants to ensure that commercial uses fund further research, I've occasionally had an idea for a fun data visualisation or doing a tutorial based on the work I do in biodiversity, but those would technically need approval from the IUCN for me to share, and so they invariably get put to one side, which is sad. This is particularly frustrating for someone like me that likes to work in the open where possible.

However, last week iNaturalist published a set of open range-maps generated from their citizen-science observation data. Whilst the general opinion from ecologists I've spoken to seems to be that this data isn't as good as the Red List expert assessments (which makes sense), and it's no where near as rich as the IUCN data that breaks down ranges for things like breeding and non-breeding seasons, and even covers areas from where the species is now extinct or has been artificially introduced, it is a data source that I could use for doing a bunch of smaller fun things I've been wanting to share.

The iNaturalist data is properly open under a Creative Commons CC-BY license, which means I can do my fun visualisations and tutorials with it without needing to be blessed by the IUCN, and then other people could take that knowledge and apply it with the better IUCN data in their own projects when they need better information.

To start to understand how useful this data is I've started to do some simple analysis on sections of it to compare with what I'd expect from the IUCN.

Next week

iNaturalist data

I want to dig a bit more into that iNaturalist data, just so I can understand the limitations of what it has versus what I'm used to working with, so I know where it is and isn't sensible to apply it.

Nordic RSE

The CFP is out for the 2025 Nordic-RSE conference, and I plan on putting a couple of things in to that, so I need to pull those together.

LIFE

Because Ali and I just worked on a fresh set of layers, I need to regenerate the results for the next LIFE work based from those new maps.

STAR

I got some feedback from Chess on my STAR species filtering, so I will take that for another spin, and at the same time follow through with my promise from last week's notes to port over the filtering report code I added to the LIFE pipeline recently to my STAR code.

Interesting links

  • I read Glaciers, gender, and science: A feminist glaciology framework for global environmental change research, a paper from 2016 looking at how masculinity has not only shaped who does research into glaciers, but also the kind of research that gets done is shaped by that: we know more about glaciers that are hard to get to than those whose melting would have bigger impacts on communities, for instance. Worth a read just to understand all the secondary effects the male dominance of a field can have.

  • I happened across Parsl, which isn't a parser, but a Python parallelism library. I've not tried it yet, but obviously it's relevant to the kind of thing I work on, so I should.

Tags: wasm, zfs, webplats, life, inaturalist