Substrate Merklized Storage
How to use the slides  Full screen (new tab)
Substrate Storage
What We Know So Far
v
What We Know So Far

Recall that at the
sp_io
layer, you have opaque keys and values. 
sp_io::storage::get(vec![8, 2])
;vec![8, 2]
is a "storage key".

sp_io::storage::set(vec![2, 9], vec![42, 33])
;
v
What We Know So Far
Nomenclature (with some simplification):
Environment providing host functions, namely storage ones: "
Externalities
Environment".
Notes:
 In Substrate, a type needs to provide the environment in which host functions are provided, and can be executed.
 We call this an "externality environment", represented by
trait Externalities
.  By convention, an externality has a "backend" that is in charge of dealing with storage.
v
What We Know So Far
#![allow(unused)] fn main() { sp_io::TestExternalities::new_empty().execute_with( { sp_io::storage::get(..); }); }
v
What We Know So Far
Key Value
 How about a keyvalue storage externality? why not? 🙈
v
Key Value
v
Key Value
 "Storage keys" (whatever you pass to
sp_io::storage
) directly maps to "database keys".  O(1) read and write.
 Hash all the data to get the root.
Notes:
Good time to hammer down what you mean by storage key and what you mean by database key.
literally imagine that in the implementation of sp_io::storage::set
, we write it to a keyvalue
database.
v
Key Value
 If alice only has this root, how can I prove to her how much balance she has?
SEND HER THE WHOLE DATABASE 😱.
Notes:
Alice is representing a light client, I represent a full node.
v
Key Value
 Moreover, if you change a single keyvalue, we need to rehash the whole thing again to get the updated state root 🤦.
Substrate Storage: Merklized
 This brings us again to why blockchain based systems tend to "merkelize" their storage.
v
Merklized
Substrate uses a base16, (patricia) radix merkle trie.
v
 Merkle tree.
 Typically contains values at leafs.
v
 Trie.
 Assuming only leafs have data, this is encoding:
"ac" => 0x12 
"ad" => 0x23 
"be" => 0x34 
Notes:
this is how we encode key value based data in a trie.
v
 Radix Tree.
 Less nodes to encode the same data.
"ac" => 0x1234 
"ad" => 0x1234 
"be" => 0x1234 
Notes:
More resources:
 https://en.wikipedia.org/wiki/Merkle_tree
 https://en.wikipedia.org/wiki/Radix_tree
 https://en.wikipedia.org/wiki/Trie
Namely:
The data structure was invented in 1968 by Donald R. Morrison, with whom it is primarily associated, and by Gernot Gwehenberger.
Donald Knuth, pages 498500 in Volume III of The Art of Computer Programming, calls these "Patricia's trees", presumably after the acronym in the title of Morrison's paper: "PATRICIA  Practical Algorithm to Retrieve Information Coded in Alphanumeric". Today, Patricia tries are seen as radix trees with radix equals 2, which means that each bit of the key is compared individually and each node is a twoway (i.e., left versus right) branch. v
Merklized
Substrate uses a base16, (patricia) radix merkle trie.
v
Merklized
 Substrate does in fact use a keyvalue based database under the hood..
 In order to store trie nodes, not direct storage keys!
 We take the storage key, and make it be the path on a trie.
 Then we store the trie nodes, referenced by their hash, in the main database.
v
Merklized
Notes:
imagine:
sp_io::storage::get(b"ad")
v
Merklized
Notes:
realistically, the storage key is something like (system_)16
, but I have put the strings here for
simplification.
Trie Walking Example
 We know the stateroot at a given block
n
.  assume this is a base26, patricia trie. English alphabet is the keyscope.
 Let's see the steps needed to read
balances_alice
from the storage.
v
v
v
v
v
Merklized: Proofs
 If alice only has this root, how can I prove to her how much balance she has?
v
Notes:
The important point is that for example the whole data under _system
is not hidden away behind one hash.
Dark blue are the proof, light blue's hashes are present.
Receiver will hash the root node, and check it against a publicly known storage root.
This differs slightly from how actual proof generation might work in the code.
In general, you have a tradeoff: send more data, but require less hashing on Alice, or opposite (this is what we call "compact proof").
v
Merklized: Proofs
 🏆 Small proof size is a big win for light clients, and Polkadot.
Merklized: Recap
 Storage key (whatever you pass to
sp_io
) is the path on the trie.
 Storage key is arbitrary length.
 Intermediary (branch) nodes could contain values.
:code
contains some value,:code:more
can also contain value.
 Storage Key != Database Key.
 1 Storage access = Many data base access.
Notes:
how many database access would do you think it is?
we will explain this in a few slides, but assuming an order N
tree, and assuming it is balanced,
it will be O(LOG_n)
.
Base 2, Base 16, Base26?
 Instead of alphabet, we use the base16 representation of everything.
Base16 (Patricia) Merkle Trie.
System
>73797374656d
:code
>3a636f646500
v
Base 2, Base 16, Base26?
Tradeoff: "IO count vs. Node size"
Between a light clint and a full node, which one cares more about which?
Notes:
Light client cares about node size. When proof is being sent, there is no IO.
First glance, the radix8 seems better: you will typically have less DB access to reach a key. For example, with binary, with 3 IO, we can reach only 8 items, but with radix8 512.
So why should not choose a very wide tree? Because the wider you make the tree, the bigger each node gets, because it has to store more hashes. At some point, this start to screw with both the proof size and the cost of reading/writing/encoding/decoding all these nodes.
v
Base 2, Base 16, Base26?
Note:
Here's a different way to represent it; the nodes are bigger on the base16 trie.
v
Base 2, Base 16, Base26?
 base2: Small proofs, more nodes.
 base8: Bigger proofs, less nodes.
✅ 16 has been benchmarked and studies years ago as a good middleground.
Notes:
Anyone interested in blockchain and research stuff should look into this.
Unbalanced Tree
v
Unbalanced Tree
 Unbalanced tree means unbalanced performance. An attack vector, if done right.
 More about this in FRAME storage, and how it is prevented there.
Notes:
 underestimate weight/gas etc.
 You as the runtime developer must ensure that you use the right keys.
 This is particularly an issue if an end user can control where they can insert into the trie!
 The main prevention is using a cryptographically secure hash function on the frame side.
WAIT A MINUTE... 🤔
 Two common scenarios that merkle proofs are kinda unfair:
 If the one of the parent nodes has some large data.
 If you want to prove the deletion/nonexistence of a leaf node.
v
v
WAIT A MINUTE... 🤔
New "tie format" 🌈:
 All data containing more than 32 bytes are replaced with their hash.
 The (larger than 32 bytes) value itself stored in the database under this hash.
#![allow(unused)] fn main() { struct RuntimeVersion { ... state_version: 0, } }
v
What is the ramification of this for full nodes, and light clients?
Notes:
Both read and write have an extra step now, but proof are easier.
Note from emeric: the green node is not really a "real" node, it is just { value: BIG_STUFF }
stored in the database.
I will skip this detail for the sake of simplicity.
One can assume that the green node is like any other node in the trie.
Substrate Storage: The Updated Picture
WAIT A MINUTE... 🤔
 We rarely care about state root and all the trie shenanigans before the end of the block...
A blockscoped cache for storage.
Notes:
In other words, one should care too much about updating a "trie" and all of its hashing details while the block is still being executed? All of that can be delayed.
Overlay
 Is a cache layer outside of the Runtime.
 It works based on keyvalues, not trieformat.
v
Overlay
 Almost identical semantic to your CPU cache:
 Once you read a value, it stays here, and can be reread for cheap.
 Once you write a value, it will only be written here.
 It can be read for cheap.
 All writes are flushed at the end of the runtime api call.
 No race conditions as runtime is singlethreaded.
v
v
v
v
v
v
v
Overlay
 Cheap != Free
Notes:
 In your code, you often have an option to either pass stack variables around, or reread code from
spio
. Most often, this is a microoptimization that won't matter too much, but in general you should know that the former is more performant, as won't go the the host at all.  A deletion is basically a write to
null
.
v
Overlay
 The overlay is also able to spawn childoverlays, known as "storage layer".
 Useful for having a transactional block of code.
#![allow(unused)] fn main() { // spawn a new layer. with_storage_layer( { let foo = sp_io::storage::read(b"foo"); sp_io::storage::set(b"bar", foo); if cond { Err("this will be reverted") } else { Ok("This will be commit to the top overlay") } }) }
Notes:
 implement with zerocopy. So, the size of values is not so important, it is more about the number.
v
v
v
Overlay
 There is a limit to how many nested layers you can spawn
 It is not free, thus it is attackable.
#![allow(unused)] fn main() { with_storage_layer( { let foo = sp_io::storage::read(b"foo"); with_storage_layer( { sp_io::storage::set(b"foo", b"foo"); with_storage_layer( { sp_io::storage::set(b"bar", foo); with_storage_layer( { sp_io::storage::set(b"foo", "damn"); Err("damn") }) Ok("what") }) Err("the") }); Ok("hell") }) }
v
Overlay
 What if I call
sp_io::storage::root()
in the middle of the block?  Can the overlay respond to this?
Notes:
NO! The overlay works on the level on keyvalues, it knows nothing of trie nodes, and to compute the root we have to go to the trie layer and pull a whole lot of data back from the disk and build all the nodes etc.
v
Overlay: More Caches
 There are more caches in the trie layer as well. But outside of the scope of this lecture.
./substrate help  grep cache
Notes:
https://www.youtube.com/embed/OoMPlJKUULY
Substrate Storage: Final Figure
v
Substrate Storage
There are multiple implementations of Externalities
:
TestExternalities
:Overlay
TrieDb
withInMemoryBackend
Ext
(the real thing )Overlay
TrieDb
with a real database being the backend
v
Substrate Storage
 Recall: Any code accessing host functions needs to wrapped in something that implements
Externalities
#![allow(unused)] fn main() { // ❌ let x = sp_io::storage::get(b"foo"); // error: // thread '..' panicked at '`get_version_1` called outside of an Externalitiesprovided environment.' }
#![allow(unused)] fn main() { // ✅ SomeExternalities.execute_with( { let x = sp_io::storage::get(b"foo"); }); }
State Pruning
 Each runtime will think that it has access to its full state, behind
sp_io::storage
.  Does the client then store one full trie perblock?
Surely not.
Notes:
 Only trie nodes that are updated from one block to the other are created as new DB Keys.
 For the unchanged ones, we only reference the existing one.
v
State Pruning
vState Pruning
vState Pruning
vState Pruning
v
State Pruning
 🧠 Data stored onchain, but rarely changed? De nada.
 State pruning is an entirely client side optimization,
Child Trees
v
Child Trees
 Stored on a different DB Column (asyncish bulk deletion).
 Most importantly, alternative trie formats.
Trie Format Matters!
 Recall that in our "trie walking", we took the state root, and got the root node from the DB.
 The state root of any substratebased chain, including Polkadot, is the hash of the "Trie Node".
Trie format matters! and therefore it is part of the polkadot spec.
Notes:
Meaning, if another client wants to sync polkadot, it should know the details of the trie format.
Lecture Summary/Recap:
 KVBased storage
 Merklized storage, and proofs
 Large nodes
 Radix order consequences
 Unbalanced trie
 State pruning
Additional Resources! 😋
Check speaker notes (click "s" 😉)
Notes:

Shawn's deep dive: https://www.shawntabrizi.com/substrate/substratestoragedeepdive/

Basti's talk on Trie caching: https://www.youtube.com/watch?v=OoMPlJKUULY

About state version:

An "old but gold" read about trie in Ethereum: https://medium.com/shyftnetwork/understandingtriedatabasesinethereum9f03d2c3325d

On optimizing substrate storage proofs: https://github.com/paritytech/substrate/issues/3782

Underlying trie library maintained by Parity: https://github.com/paritytech/trie

An interesting, but heretical idea: can the runtime of block N, access state of block N1? HELL. NO. It might sound like a "but why nooooot" type of situation, but it breaks down all assumptions about what a state transition is. The runtime is the state transition function. Recall the formula of that, and then you will know why this is not allowed.
Post Lecture Feedback
Double check the narrative and example of the BIG_STUFF
node.
An example/exercise of some sort
would be great, where students call a bunch of sp_io
functions, visualize the trie, and invoke
proof recorder, and see which parts of the trie is exactly part of the proof.