Weeknotes: 17th April 2023
What I did last week
Swift GDAL replacement work
I’ll confess to feeling like I was a bit stuck in a quagmire last week as I tried to push this bit of work to a point of getting some useful results. The learning I got from last week as just the realisation of how much is implicit in our calculations that use python. In the current python persistence pipeline we’ll happily do things like:
AoH = area_map * elevation_map * habitat_map * range_raster
What numpy hides is that all these maps have different types - the area per pixel raster is Double, the range_raster is UInt8, the elevation_map and habitat map are bool effectively, and numpy hides all this. In a stricter language, as per Swift or if we were to do this in Ocaml, then we’d be forced to use correct types throughout. On one hand, this makes it harder for ecologists to work with, but on the other hand given what we know about things like floating point calculation associativity, it’s actually impactful on the result the point at which various types get converted to different representations!
I guess the ideal is that we ensure that we know what version of things like numpy were used for a given result so we can assert some knowledge about when things were forced to change type. If only there was a system that’d encode that in the results…
Anyway, this caused me to end up spending more time that I’d thought trying to nudge Swift’s type system into letting me have abstract datatypes for layers that actually have concrete types under the hood. I did at least get to the stage of having a Swift Yirgacheffe layer that’ll do chunked rasterisation of vector range layers with their minimal bounds.

The other thing it made me ponder was how to make this zero copy. As is the case with a certain class of modern languages, you get the with
pattern for accessing raw data a lot in Swift, so as to ensure safe access. Python does use with clauses for things like file access too, just we don’t see it for access to data frames in numpy, pandas, etc. But I envisage that under the hood we’d end up with Swift data layers doing:
result.withRawData {
layerA.withRawData { layerA_data in
layerB.withRawData { layerB_data in
for index in 0..<count {
result[index] = layerA_data[index] + layerB_data[index]
}
}
}
}
This just again looks very unfriendly to those thinking about ecological calculations, but works well under the hood and just adds weight to my thought that we don’t want to mix code that does the hard work with code that lets people express the higher level ecological algorithms.
More Yirgacheffe usage in team
I’m still keen to push Yirgacheffe as a way to help speed up some of the work I see our team doing, so I was glad to get chance to try this out on another person from the ecology side.
I met up with one of the ecologists to chat about Yirgacheffe, after they unintentionally caused our main compute server to run out of memory which impacted a bunch of things. It seems Yirgacheffe as stands will work for the kinds of calculation they’re doing, so they're going to give it a whirl. As with chatting to other potential users, I think I could make it more useful if I added automatic layer resampling to Yirgacheffe.
Paper rejected
The Ark paper got rejected, but we got a bunch of useful feedback from the reviewers. They liked the survey section of the paper, so that at least feels like a contribution we need to get somewhere. They felt the later half lacked compelling argument as to why this specifically was needed and what it specifically was, so we need to work on the clarity of that later half of the paper.
Methodology doc
I did a first pass on Keshav and Co’s methodology paper, enough to realise I need a thermos of fresh tea and a large notepad before I do a second pass :)
Robin’s FS work
I had a quick review of Robin’s file system work, and a brief chat with him to ensure I got the basic lay of the land. Thankfully Robin has documented his work - thanks Robin!
EEG meeting
I chaired a very quiet EEG meeting - but it was good to be forced to try and commit group member names to memory :) We agreed a 4C away day in the field was a good idea and that we’d let Anil sort that out for us :p
This week coming
- Try to get something out this Swift work to wrap it up and let me go back to higher level things
- Meeting with Tom and Patrick to go through the methodology document
- Fork the paper and consider how we might target it otherwhere
- Chat to Charlotte about database access and setup