Designing DAG-based consensus
How to use the slides - Full screen (new tab)
Designing DAG-based consensus
Goals of this lecture
- formalize the consensus problem and related concepts
- provide a framework for designing DAG-based consensus protocols
What is consensus?
- a process of agreeing on the same result among a group of participants
- a fundamental problem in distributed computing
- a key component of blockchain technology stack
Consensus features
liveness, safety, integrity
We have already seen some
Nakamoto
Babe
Grandpa
Sassafras
Tendermint
...
Who is running the protocol?
Participants, called nodes
Nodes
- nodes can be either honest or malicious
- honest nodes follow the protocol
- malicious nodes can deviate from the protocol in any way they want
- malicious nodes can collude with each other
- malicious nodes can be controlled by an adversary
Public key infrastructure
- every node has its own private and public key
- every node signs messages with its private key
- every node verifies messages with other nodes' public keys
Public key infrastructure
authenticated point-to-point communication
Adversary
Adversary can control the network delays, but is computationally bounded, i.e. it cannot break the cryptography (like forging the signatures).
Network
Communication via network... but what kind of network?
Network models
synchronous
partially synchronous
asynchronous
Network models: synchronous
There exists a known upper bound \(\Delta\) on message delivery time.
Intuition: there's a well-defined notion of a protocol round
Network models: asynchronous
There is no upper bound on message delay, though delivery is guaranteed.
Intuition: you can't tell whether a node has crashed or has a long delay
Network models: asynchronous
There is no upper bound on message delay, though delivery is guaranteed.
- We assume that the adversary has full control over the message delays.
- The concept of a timeout is basically useless.
Network models: partially synchronous
There exists a known bound \(\Delta\), and an unknown point in time GST after which the communication becomes synchronous with a delay \(\Delta\).
Intuition: protocol will eventually work synchronously, but it needs to be safe before
Crucial theoretical results
[FLP theorem] It is impossible to have a deterministic protocol that solves consensus in an asynchronous system in which at least one process may fail by crashing.
[Castro-Liskov theorem] It is impossible to have a protocol that solves consensus in a partially synchronous system with \(3f+1\) nodes in which more than \(f\) processes are byzantine.
Crucial theoretical results
[FLP theorem] It is impossible to have a deterministic protocol that solves consensus in an asynchronous system in which at least one process may fail by crashing.
[Castro-Liskov theorem] It is impossible to have a protocol that solves consensus in a partially synchronous system with \(3f+1\) nodes in which more than \(f\) processes are byzantine.
Consequence
The best one can hope for in asynchronous scenario is probabilistic protocol tolerating up to \(f\) faults for \(3f+1\) participants.
✅ Doable!
Note on randomness
Real probability is actually needed in the extremely hostile environment. In case where the adversary is not legendarily vicious, even a dumb (but non-trivial) randomness source will do.
Responsiveness
Responsiveness
Protocols that are not responsive have to wait for \(\Delta\) time to proceed to the next round.
Responsiveness
Protocols that are not responsive have to wait for \(\Delta\) time to proceed to the next round.
- \(\Delta\) must be long enough to allow all honest nodes to send their messages.
- \(\Delta\) must be short enough to allow the protocol to make progress.
- In case of failure, they have to perform a pretty expensive recovery procedure (like the leader change).
Responsiveness
Protocols that are responsive wait for \(2f+1\) messages to proceed to the next round.
Why \(2f+1\)?
Responsiveness
Protocols that are responsive wait for \(2f+1\) messages to proceed to the next round.
Among \(2f+1\) nodes, there are at least \(f+1\) honest ones, i.e. honest majority.
Responsiveness
Protocols that are responsive wait for \(2f+1\) messages to proceed to the next round.
- Asynchronous protocols must be responsive.
- In good network conditions, they significantly much faster.
Checkpoint
Up to this point, we covered:
- consensus problem
- node types and adversary
- inter-node communication
- network models (synchronicity)
- protocol limitations in asynchronous network (honesty fraction and the need for randomness)
- responsiveness
Warmup exercise: broadcast
(In an asynchronous network) reliably send a single message to all other nodes.
- (validity) If the sender is honest and broadcasts a message \(m\), then every honest node outputs \(m\).
- (integrity) If an honest node outputs a message \(m\), then it must have been broadcast by the sender.
- (agreement) If an honest node outputs a message \(m\), every other honest node outputs \(m\).
Reliable broadcast protocol (RBC)
Reliable broadcast in practice
Due to the very high communication complexity we use heuristics or cryptography-based tricks.
Blockchain protocol vs Atomic broadcast
Atomic broadcast
Randomness formalized
Randomness beacon
Atomic broadcast: timeline
Atomic broadcast: timeline
Fun fact
Aleph paper, as the first, also achieved fully asynchronous randomness beacon:
- with efficient setup (\(O(1)\) rounds, \(O(N^2)\) communication)
- with \(O(1)\) expected rounds to output a random value with \(O(N)\) communication per round
Consensus protocols (selection)
Classical protocols:
- [DLS’88], [CR’92],
- PBFT [CL’99]
- Random Oracles … [CKS’05]
- Honey Badger BFT [MXCSS’16]
- Tendermint [BKM’18]
- VABA [AMS’19]
- Flexible BFT [MNR’19]
- HotStuff [YMRGA’19]
- Streamlet [CS’20]
- Grandpa [SKK'20]
DAG-based protocols:
- [L. Moser, P. Meliar-Smith ‘99]
- Hashgraph [B’16]
- Aleph [GLSS’18]
- DAG-Rider [KKNS’21]
- Highway [KFGS’21]
- Narwhal&Tusk [DKSS’22]
- Bullshark [SGSK’22]
DAG-based protocols
DAG
Directed Acyclic Graph
How does it relate to consensus?
Intuition: graph represents the dependencies between messages (units).
Framework core
- We maintain a local DAG representing our knowledge of the units.
- We perform a local, offline consensus on our DAG.
Framework core
- We maintain a local DAG representing our knowledge of the units.
- We perform a local, offline consensus on our DAG.
Framework core (in other words)
- (online): sending and receiving units that contribute to the local DAG
- (offline): everybody performs a local consensus on the DAG, just by looking at it
Clue observations
- local DAGs might differ...
- but they are guaranteed to converge to the same DAG
- the offline consensus is guaranteed to produce the same result
Adversary control
Randomness? Where is randomness?
It is put into the local consensus protocol.
Relation to the atomic consensus problem
- nodes receive transactions and put them into units
- nodes send each other their new units
- (locally) nodes come up with a linear ordering of the units and make blocks from chunks
Digression: block production, information dissemination and finalization
The common approach (e.g. in Substrate):
- production and dissemination is done in the same layer
- afterwards, nodes perform consensus on finalizing disseminated blocks
Natural approach for DAG-based protocols:
- information dissemination happens as 'the first phase'
- block building and (instant) finalization happens locally
Main consequences of the different separation
- block signatures
- speed
Local consensus: goal
Local copies might differ significantly, blocks might have not come to all nodes yet, etc... but we have to make common decision about unit ordering!
Key concept: availability
Intuitively, a unit is available if:
- most of the nodes have it
- it was distributed pretty promptly (we won't call a unit available, if it finally arrived everywhere after a month)
- most of the nodes know that most of the nodes know that most of the nodes know... that it is available (mutual awareness)
Availability
If a unit is available, it is a good candidate for being chosen as an 'anchor' in extending current ordering.
Lightweight case study
Aleph Zero BFT protocol
Head
Building blocks
Choosing head
Availability determination
Units vote for each other's availability.
(Part of) availability determination
VoteU(V) =
- [[U is parent of V]] if V is from the round just after the round of U
0
/1
if all children of U voted0
/1
CommonVote(round(
U), round(
V))
otherwise
(U comes from the earlier round than V)
Bonus: generating randomness
Sig sk
(nonce)
- randomness must be unpredictable
- delayed reveal
- must depend on \(f+1\) nodes
- cannot be disturbed by the adversary
Standard way
Standard way
Problem: need for trusted dealer!
One simple trick
Everybody is dealing secrets