How to update a Tezos Smart-Contract
25 Jan 2023
Tags: 4C, blockchain, smart contracts, tezos
How to update a Smart-Contract
The Cambridge Center for Carbon Credits (4C) has been looking at using a distributed ledger, aka blockchain, to make a public, immutable, verifiable store of information about carbon-offsets as a way of providing traceability into the validity of that offset. Whilst blockchains are not my domain of expertise, I was tasked earlier this year with getting the project’s smart contract ready for production, being the experienced generalist software-engineer on the team. I’ve previously written up some of the trials of getting started with the Tezos blockchain platform from a software-engineering perspective, and this post is more of the same, but focussed on one particular area: how do you ship updates to your smart-contract once you’ve shipped it?
The problem is this: any bit of software evolves over time, either to fix bugs, refine existing features in the light of new knowledge, or to add support for new functionality. However, one of the reasons people use blockchains is because they are public immutable data-structures, which can only be appended and never changed: this means that the smart-contract code you instantiated to run on the blockchain has the same properties of immutability, by virtue of being stored in that same blockchain on which it is running. This leaves us in a bit of a quandary.
This post looks then both at why did people build a system that is so hard to upgrade (which to me initially wasn’t as obvious as I’d assumed), and then looks at the different strategies that people use for upgrading smart-contracts. As before, this article is very much focussed on Tezos (who, for clarity, are one of 4C’s funders), but I have dragged in references from the Ethereum blockchain this time, as they have more documentation around upgrades that I was able to learn from; Tezos is still relatively young, and as such its documentation is not quite as rich as Ethereum’s.
Two quick notes before I dive into things:
Firstly, one on terminology. I keep using the term “upgrade” here, as that’s the term used in the blockchain community for an update to a smart-contract. But just to be clear, all we’re talking about here is replacing one version of a smart-contract with a new version of the same code with some small changes made. I don’t like the term “upgrade” as it implies a subjective appraisal of the new code, but given that’s the term everyone uses, in particular the other documents I’ll reference, I’ll be using it here too.
Secondly, keep in mind whilst reading this that executing smart-contracts costs money, and the amount it costs is directly proportional to the amount of computation you do. Usually that amount per execution (if you’ve designed your smart-contract well) is small, but at scale might be significant. To be fair to smart-contracts, this is technically true of any code, as you pay for the energy it consumes etc., but it’s just more apparent on a smart-contract as the price is both more than your desktop or phone per unit of compute, and you get told what it costs each time you call a smart-contract.
Why is upgrading so hard?
There’s two parts to why smart-contract upgrades are hard: a technical side, and a social side. Let’s look at both of those before we get onto how we can then achieve an upgrade.
To make the rest of the discussion easier, here’s a quick recap of the technical problem, as seen from someone working on the Tezos platform specifically. This may or may not apply to other smart-contract platforms, I haven’t yet looked at others to see how they deal with it in detail.
Somewhere in my last post I have this quick reduction of a smart-contract:
“As a technologist, it’s probably best to ignore both the “smart” and “contract” parts of the term, and instead think of them as small bits of server-side code of limited complexity with some associated storage. The code will have one or more API endpoints where you can call it to either read the storage (or values derived from that storage) or to cause the storage state to mutate.”
The bit I skipped over in that summary is that a smart-contract is stored on the blockchain and accessed via an address that locates them on the chain. Because you can never update a blockchain, only append to it, this means the code at that address used to access the smart-contract is both unchangeable and undeletable - both desirable properties of blockchains generally, but less so of software. You may choose to update the code you have for your smart-contract and re-deploy it, but all you’re doing then is creating a second/new instance (with a new reference address) of that smart-contract, not updating or replacing the old version. Your existing users will be blissfully unaware of what you’ve done as the old contract will continue to function as before. Even the state of the old contract isn’t there for your new contract unless you manually take a copy - storage on a smart-contract is tied to the specific instance on the blockchain.
A blockchain is a mechanism used to ensure auditability and visibility, so all this makes sense. But as a software-engineer, where I see deployed code I assume there must be a way for me to ship updates, yet what seemed to be missing when I read the documentation for Tezos and the Ligo language we use were the design patterns and/or tooling around making updates that went alongside this technical limitation - similar to how we have tooling for database-migrations etc. on web-services. That is then when we come on to the social aspect.
As a software engineer, used to dealing with all the upgrade complexity of web-services, I started out by seeing the lack of support for upgrading smart-contracts as just a problem of the immaturity of the domain: updates are obviously inevitable and/or desirable in any real-world software system, and so any problems in dealing with upgrades in Tezos is because the community hasn’t yet got around to tackling it.
But in digging into this I’ve found that this isn’t quite true: whilst the Tezos documentation doesn’t call it out as clearly as say Ethereum’s documentation does, having engaged with people over the topic on the Tezos slack there is clearly a bunch of thought there, but it doesn’t all go the way I expected.
I view smart-contracts as software, but to a lot of people in the community they are actually viewed as contracts - a thing where some set of resources (often that blockchain’s currency or some notion of a token representing an off-chain resource) are described in a binding manner. For these people upgrades are regarded with suspicion. One advantage of a smart-contract over a lot of software is that the code is public: it’s on the blockchain for all to see, so you can then validate that the contract does what it says it does before you tie any value to it (the aforementioned resources). If a contract could be easily upgraded, then it doesn’t matter that you had faith in the contract when you invested in it, as it might be updated to do something different at a later date, potentially making off with all that value.
I have to admit, not being into cryptocurrency or other popular tokens, and just coming at this purely as a software-engineer I was a bit naive to this line of thinking, but I can see why then some see having friction to the upgrade process being a good thing. I guess this can be seen from my earlier quoted summary: I don’t see these things as contracts, just as bits of code, whereas some people definitely view them as contracts, and if you’re in this field that’s something you really need to consider whether your contract’s users will have these kinds of expectation.
So, there’s both a technical and a social aspect to why is it so hard, and I just wanted to flag this point: sometimes things are hard because people didn’t consider them, but also sometimes things are considered and left deliberately hard.
And it’s worth noting, if your users are of the “immutable is good camp”, that doesn’t mean you can’t upgrade your deployed smart-contract, it rather means you need to work in a way that brings your users along with you. For instance, one common use of smart-contracts has been for governance purposes, allowing people to vote on proposals etc. (referred to as multisig), and so you may want to build such a thing into your smart-contract if you have users who will want oversight into upgrades. I’m not going to cover that sort of thing here, but I think it’s worth being aware of this as an option.
I still want to update my contract, so how do I do it?
Whilst I understand that there’s social concerns by some about upgradability, I’m personally more interested in the fact that no software survives first contact with the user, and so I want to know that if the 4C smart-contracts need update they can be. But because it seems upgrades are not an easy to do by design we need to design our smart-contracts with such a plan in mind.
Looking around the Tezos community there’s not really much published on this topic, but thankfully the Ethereum main website does have documentation around a bunch of strategies. I’m going to summarise some of these here, as ones I’ve considered as part of 4C’s plan, but if you’re really interested in this topic I recommend you go to the source material, as it covers things in more detail than I will here.
To help evaluate the costs of the mechanisms discussed I put together a simple smart contract that let me try them out in the various configurations discussed (single stage proxy, linked list of calls, lambda contracts). You can find the code for that on the 4C github repo. I did my testing on one of the Tezos testnets, so the costs will not match those of doing the work on the live Tezos mainnet blockchain, but it’s hoped that the proportional costs are indicative of any trends. My contract makes no attempt to migrate state between contracts, it just is there to check the calling overheads.
Upgrade strategy 1: replace, don’t upgrade
Ethereum’s guide refers to this as “contract migration”: This is where you simply issue a second smart-contract, copy across any state from the original contract that you care about, and then convince all your users to migrate to the new smart-contract. Sub-optimal from my personal perspective, but it does make all your users aware that the contract has been changed, as you need to tell them to use the new contract.
One import thing that my colleague Keshav pointed out about this approach (which surprisingly isn’t covered in the Ethereum documentation): your original contract is still active! Although you want to move everyone to the new address, there’s no guarantee that people will follow, and thus you’ve effectively forked your contract at this point. So if you do go down this route you probably want to add some cut-off switch to your contracts, to stop them working at a certain point - for instance you could store a successor address, which if not null would stop endpoints reacting, effectivly making the old script inert. By storing the successor address, rather than just using a bool to indicate end-of-life, people can examine the contract storage and know where to find the replacement contract.
However this kill-switch mechanism might be looked on unfavourably by those contract-mindset people as a way you take remove value from them without consent, so I’d tread carefully with this one depending on your application and its expected user-base.
Upgrade strategy 2: use a proxy contract
This is a fairly common pattern that people advocate. In this one you have a single shim contract that just stores the address of the real smart-contract, and the shim for forwards calls to the actual smart-contract and passes back any responses to the caller. Upgrading then is just a matter of instantiating a new contract with migrated state, and then updating the pointer in the shim contract, and none of your users need be any the wiser that the upgrade has happened.
There’s two downsides to this approach that immediately leap out at me. Firstly, this is racy: at some point you need to snapshot the current contract’s state and put it into a new contract, and then afterwards you update the shim to the new contract’s address: this leaves an opportunity for someone to call the original contract between the time you took the snapshot and the time you updated the shim. You could just put the shim into a state whereby it bounces all requests, then take the snapshot, create the new contract, and then update the shim, but this starts to feel fragile - someone is going to get that wrong one day. A variation then is to have two child contracts: one for code, and one for state, but that assumes you never want to migrate state along with the code.
Basically things are now more complicated, and so you need to be more careful, which feels really like the wrong thing to be doing in your smart-contract.
The second downside is concerned with security checks. Both Tezos and Ethereum have two ways of validating the caller to a smart-contract, which is how a smart-contract can check that the person trying to carry out an operation has the permissions to do so. You can either check who it was that started the transaction that includes the current operation (the SOURCE operation in Tezos, ORIGIN in Ethereum), or you can check who it is that directly invoked the current operation (the SENDER operation in Tezos, CALLER in Ethereum).
Now, the former call is problematic, as someone can craft a contract that you invoke, which then calls another smart-contract that uses the SOURCE/ORIGIN opcode as s security check, so even though you don’t think you’re invoking that second contract, you appear as the SOURCE/ORIGIN, and thus you lose all your tokens or whatever that contract was storing. So, as a result advice is given that you should use SENDER/CALLER for your checks, but the proxy model breaks that, as SENDER/CALLED is your proxy contract.
Again, one can work around this, by having your security checks distributed between your proxy contract and the actual contract: the former checks the CALLER is the right person (which may involve asking the actual contract) and the actual contract will only talk to the proxy contract, but again we’re adding complexity here that makes me nervous, particularly as some of that complexity is now embedded in the proxy contract that can never be updated.
Upgrade strategy 3: linked lists of contracts
This is a sort of extension to an idea I floated earlier about using an optional field to act as a kill-switch for your old contract: you could have an optional address that if set stops the old contract responding, and then people know where to find the new contract. Well, why not just cut the human out here, and when the address is set you forward calls from the old contract to the new contract, effectively turning your old contract into a proxy?
Similar complexity issues arise here as with the old proxy contract pattern vis security, and over time this chain will grow longer increasing the cost of each call, and remember, every call to a smart-contract costs currency based on how much work it does. Indeed, in some guides to implementing smart-contracts I’ve read they even shy away from the simple proxy-model for fear of the disproportionate costs of a single contract redirection.
To test this I built a small five contract chain based on the test contract I mentioned at the start of this section. Using the test network I noted that invoking the contract cost ꜩ0.000479, and then each additional level of indirection cost me about ꜩ0.000171 over, giving me an linear cost increase per layer, and for roughly ever three levels of indirection I’m adding 100% of the original contract cost. Thus if your contract has frequent updates, then this is not the ideal mechanism for you.
However, these costs could be managed if you just treat it as a special case of the replacement strategy: you use the proxy as a backstop whilst you migrate everyone over to the new contract, but having the linked list means no one gets interrupted service, and the costs act as an incentive for people to migrate to the new contract.
Again, as with the proxy-contract this is also a racy option in terms of capturing the current contract’s state to put into the new contract, so you need to plan your upgrade strategy appropriately.
Upgrade strategy 4: lambdas in storage
One neat and/or problematic feature about Tezos is that you can store code in a contract’s storage and update that functionality be writing to the storage. I say both neat and problematic as which it is depends on the hat I’m wearing: as a computer scientist it’s neat as it clearly provides you an easy way to do contract upgrades without having to migrate the contract or lose indirection, but from a software-engineering perspective, it makes it very hard to do testing and assert anything about the state of what’s happening in your smart-contract. I also imagine that if I was of the aforementioned mindset that contract updates must undergo scrutiny then this might make me quite nervous, though you could use logic on the contract by having a proposal variable on your contract where you store the proposed replacement, and then a multisig vote to have the update.
Cost wise I was surprised to see that there is no overhead using lambdas. In my test contract I had the same state update code invoked either directly from a contract entrypoint as part of the contract’s michelson code, or I had it invoked using the lambda functionality having the michelson stored in the contract’s storage. The resulting costs of invoking the contract were the same. Thus for a contract where you want to do regular updates this mechanism is suitable.
Upgrade strategy 5: Use tezos domains, not raw addresses.
The root cause of a lot of the problems with upgrades comes from the standard practice of using the blockchain address of the smart contract as the reference. This can be compared to say using the IP Address of a network server - it’ll work in theory, but if the service you use ever moves to a new server then using the IP address will break, which is why we tend to use human readable domain names that provide an updateable reference to an IP address to access servers rather than the IP address itself (okay, this analogy is based in the Internet of the 90s, but bear with me).
A similar approach in theory is available for Tezos: there is a Tezos domains, which provides a similar way of mapping a human readable name to a smart-contract address. So, we can just give out our tezos domain name rather than our smart-contract’s address and all is good, right?
In theory this would be good, but there’s a lot of implicit assumption currently that you refere to things by their address. Tokens in the smart-contract world are usually seen as an (address, id) tuple, and to change that would require smart-contracts that manage tokens to start having to both change their storage types and to invoke the tezos-domains smart-contract to do address resolution, which then both an additional financial cost to invoking the contract, and a dependancy on a third party contract that people may not want to have (if your the kind of person that thinks part of the reason to use a blockchain is to not rely on centralised services).
With the latest iteration (at time of writing) of Tezos, they added a new event feature, whereby a contract can publish events as part of an API call, allowing people to see things that happened on the contract over time without having to replay the entire transaction history for the contract. This is quite a useful feature, but the important thing to note is that events are tried to a specific contract instance, which means that if your contract uses events you need to be careful about how your upgrade strategy will work.
Firstly, only the proxy and lambda patterns will let you keep your full event history, as they’re the only strategies that will let you keep the same public facing address across upgrades.
Secondly, if you’re using a proxy just to forward calls to the “real” smart-contract, you can’t emit events from the actual smart-contract, emits will have to happen from the proxy contract, which means you again are dividing your functionality up across multiple contracts, both increasing complexity, and limiting the scope of what you can do if you don’t design a flexible enough bounardy between the two, particularly given how strongly typed the APIs are in Tezos.
This ties back to events, but is more generally to the side-effects of any smart-contract. Until the 11th version of Tezos, there were no events, and in the last section I pointed out that if you’re using the proxy pattern then you’d need to ensure that your proxy handled the events. More generally, your proxy has to handle any side-effects of your smart-contract, but you can’t make it handle side-effects that don’t exist in the platform at the time of writing the proxy, only the inner upgradable smart-contract can do that. One big feature of Tezos vs other blockchains is that it has a model that lets the chain be upgraded with new features over time, so we have to assume that in the future there will be new side-effects that it’ll be desirable to use, but may be restricted from taking advantage of fully due to how we’ve chosen to implement our upgrade strategy. In this case you either want to use lambdas, or you want to trade off the benefits of a constant smart-contract address with being able to use new side-effects.
In talking to the Tezos community about update strategies, and Alexander Eichhorn developer at one of the major Tezos indexer services (and of the tzgo library that we use at 4C) pointed out that they take advantage of code hashing to identify common/popular contract types. Indexers try to make a more human friendly view on the blockchain, so they trying to understand what contracts are doing and present that in a meaningful way. What this developer was pointing out was that the API for token contracts all tend to be similar, so they need to do deeper inspection of the contract to work out what it is, and upgradability inherently breaks that (this works because a lot of tokens etc. will use a common off-the-shelf smart-contract).
Whilst this clearly isn’t something that should stop you fixing bugs in your smart-contract, depending on your use case and how much you rely on indexers, this is something you may want to consider when upgrading.
As is probably clear by now, there is no single winning pattern to follow for making your smart-contract upgradable. What I have hopefully convinced you of is that if you’re planning to do a smart-contract then you want to think about your upgrade strategy before you deploy rather than after you find your first reason to upgrade, as by then it may be too late to do a clean upgrade.
Some questions to ask yourself:
Will the users of the smart-contract be invested (both figuratively and literally) in the fact that the contract is either impossible or hard to change? Do you need to build in a governance/voting system to bring them along with you on the upgrade path?
Do you need to have the root address of your contract be permanent, or can you get away with asking people to periodically update their references to your new version?
Do you care about history of side-effects like events on your contract?
Do you care about how indexers present your contract to others?
What are the financial costs of adding abstractions to make your contract upgradable?
There is no one size fits all solution, but hopefully by using the above set of questions you can narrow down the set of patterns to the one that best fits your situation.
My thanks to my 4C colleagues who helped with this: Patrick Ferris, Sadiq Jaffer, Srinivasan Keshav, Anil Madhavapeddy, and Derek Sorensen. Additional thanks to Alexander Eichhorn on the insights around how indexers work.