Hash Based Data Structures

How to use the slides - Full screen (new tab) Slides Content --- title: Hash Based Data Structures description: Hash functions in Substrate duration: 1 hour ---

Hash Based Data Structures

Comparison to
Pointer Based Data Structures

A hash references the content of some data;
A pointer tells you where to find it;
We can not have cycles of hashes.

Hash Chains

A hash chain is a linked list using hashes to connect nodes.

Notes:

Each block has the hash of the previous one.

Merkle Trees

A binary Merkle tree is a binary tree using hashes to connect nodes.

Notes:

Ralph Merkle is a Berkeley alum!

Proofs

The root or head hash is a commitment to the entire data structure.
Generate a proof by expanding some but not all hashes.

Crucial for the trustless nature of decentralised cryptographic data systems!

Given the children of a node, we can compute a node Given the purple nodes and the white leaf, we can compute the white nodes bottom to top. If we compute the correct root, this proves that the leaf was in the tree

Security

Collision resistance: we reasonably assume only one preimage for each hash,
therefore making the data structure's linkage persistent and enduring (until the cryptography becomes compromised 😥).

Notes:

Explain what could happen when this fails.

Proof Sizes

Proof of a leaf has size $O(\log n)$
and so do proofs of updates of a leaf

Key-Value Databases and Tries

Key-value database

The data structure stores a map key -> value.
We should be able to:

put(key, value)
get(key)
delete(key)

Provability in key-value databases

We should also be able to perform the following operations for a provable key-value database:

For any key, if <key,value> is in the database, we can prove it.
If no value is associated to a key, we need to be able to prove that as well.

Types of Data Structures

Trees are rooted, directed acyclic graphs where each child has only one parent.
Merkle Trees are trees which use hashes as links.
Tries are a particular class of trees where:
- Given a particular piece of data, it will always be on a particular path.
Radix Tries are a particular class of a trie where:
- The location of a value is determined the path constructed one digit at a time.
Patricia Tries are radix tries which are optimized to ensure lonely node-paths are consolidated into a single node.

Notes:

Just a selection we'll cover in this course.

Radix Trie

Words: to, tea, ted, ten, inn, A.

Each node splits on the next digit in base $r$

Notes:

In this image, $r$ is 52 (26 lowercase + 26 uppercase).

Patricia Trie

Words: to, tea, ted, ten, inn, A.

If only one option for a sequence we merge them.

Patricia Trie Structures

#![allow(unused)]
fn main() {
pub enum Node {
  Leaf {
    partial_path: Slice<RADIX>,
    value: Value
  },
  Branch {
    partial_path: Slice<RADIX>,
    children: [Option<Hash>; RADIX],
    value: Option<Value>,
  },
}
}

Notes:

The current implementation actually makes use of dedicated "extension" nodes instead of branch nodes that hold a partial path. There's a good explanation of them here.

Additionally, if the size of a value is particularly large, it is replaced with the hash of its value.

Hash Trie

Inserting arbitrary (or worse, user-determined) keys into the Patricia tree can lead to highly unbalanced branches, enlarging proof-sizes and lookup times.
Solution: pre-hash the data before inserting it to make keys random.
Resistance against partial collision is important.
Could be a Merkle trie or regular.

Computational and Storage
Trade-offs

What radix $r$ is best?

Proof size of a leaf is $r \log_r n$
- $r=2$ gives the smallest proof for one leaf

...but:

Higher branching at high levels of the tree can give smaller batch proofs.
For storage, it is best to read consecutive data so high $r$ is better.

Merkle Mountain Ranges

Efficient proofs and updates for a hash chain
Append only data structure
Lookup elements by number

Not as balanced as a binary tree but close
Can update the peak nodes alone on-chain

Polkadot Blockchain Academy

Hash Based Data Structures

Hash Based Data Structures

Comparison to
Pointer Based Data Structures

Hash Chains

Merkle Trees

Proofs

Proofs: Merkle Copaths

Security

Proof Sizes

Key-Value Databases and Tries

Key-value database

Provability in key-value databases

Types of Data Structures

Radix Trie

Patricia Trie

Patricia Trie Structures

Hash Trie

Computational and Storage
Trade-offs

Merkle Mountain Ranges

Merkle Mountain Ranges

Merkle Mountain Ranges

Merkle Mountain Ranges

Questions