SCALE Codec

How to use the slides - Full screen (new tab) Slides Content --- title: SCALE Codec description: SCALE Codec for web3 builders duration: 1 hour ---

SCALE Codec

At the end of this lecture, you will learn why Substrate uses SCALE codec, how all different kinds of data types are encoded.

SCALE

Simple Concatenated Aggregate Little-Endian

SCALE is a light-weight format which allows encoding (and decoding) which makes it highly suitable for resource-constrained execution environments like blockchain runtimes and low-power, low-memory devices.

Little-Endian

Little endian systems store the least significant byte at the smallest memory address.

Wasm is a little endian system, which makes SCALE very performant.

Why SCALE? Why not X?

Simple to define.
Not Rust-specific (but happens to work great in Rust).
- Easy to derive codec logic: #[derive(Encode, Decode)]
- Viable and useful for APIs like: MaxEncodedLen and TypeInfo
- It does not use Rust std, and thus can compile to Wasm no_std.
Consensus critical / bijective; one value will always encode to one blob and that blob will only decode to that value.
Supports a copy-free decode for basic types on LE architectures.
It is about as thin and lightweight as can be.

SCALE is NOT Self-Descriptive

It is important to note that the encoding context (knowledge of how the types and data structures look) needs to be known separately at both encoding and decoding ends.

The encoded data does not include this contextual information.

Example: SCALE vs JSON

use parity_scale_codec::{ Encode };

#[derive(Encode)]
struct Example {
	number: u8,
	is_cool: bool,
	optional: Option<u32>,
}

fn main() {
	let my_struct = Example {
		number: 42,
		is_cool: true,
		optional: Some(69),
	};
	println!("{:?}", my_struct.encode());
	println!("{:?}", my_struct.encode().len());
}

[42, 1, 1, 69, 0, 0, 0]
7

use serde::{ Serialize };

#[derive(Serialize)]
struct Example {
	number: u8,
	is_cool: bool,
	optional: Option<u32>,
}

fn main() {
	let my_struct = Example {
		number: 42,
		is_cool: true,
		optional: Some(69),
	};
	println!("{:?}", serde_json::to_string(&my_struct).unwrap());
	println!("{:?}", serde_json::to_string(&my_struct).unwrap().len());
}

"{\"number\":42,\"is_cool\":true,\"optional\":69}"
42

Try It Yourself!

mkdir temp
cd temp
cargo init
cargo add parity-scale-codec --features derive

Little vs Big Endian Output

It can be confusing to read the output, and keep in mind endianness.

The order of bytes in the vector follow endianness, but the hex and binary representation of each byte is the same, and independent of endianness.

0b prefix denotes a binary representation, and 0x denotes a hex representation.

fn main() {
	println!("{:b}", 69i8);
	println!("{:02x?}", 69i8.to_le_bytes());
	println!("{:02x?}", 69i8.to_be_bytes());
	println!("{:b}", 42u16);
	println!("{:02x?}", 42u16.to_le_bytes());
	println!("{:02x?}", 42u16.to_be_bytes());
	println!("{:b}", 16777215u32);
	println!("{:02x?}", 16777215u32.to_le_bytes());
	println!("{:02x?}", 16777215u32.to_be_bytes());
}

1000101
[45]
[45]
101010
[2a, 00]
[00, 2a]
111111111111111111111111
[ff, ff, ff, 00]
[00, ff, ff, ff]

Fixed Width Integers

Basic integers are encoded using a fixed-width little-endian (LE) format.

use parity_scale_codec::Encode;

fn main() {
	println!("{:02x?}", 69i8.encode());
	println!("{:02x?}", 69u8.encode());
	println!("{:02x?}", 42u16.encode());
	println!("{:02x?}", 16777215u32.encode());
}

[45]
[45]
[2a, 00]
[ff, ff, ff, 00]

Notes:

notice the first two being the same. SCALE IS NOT DESCRIPTIVE of the type. The decoder is responsible for decoding this into some 1 byte-width type, be it u8 or i8 or something else.

Compact Integers

A "compact" or general integer encoding is sufficient for encoding large integers (up to 2⁵³⁶) and is more efficient at encoding most values than the fixed-width version.

Though for single-byte values, the fixed-width integer is never worse.

Compact Prefix

`0b00`	`0b01`	`0b10`	`0b11`
single-byte mode; upper six bits are the LE encoding of the value. Valid only for values of `0` through `63`.	two-byte mode: upper six bits and the following byte is the LE encoding of the value. Valid only for values `64` through `(2^14 - 1)`.	four-byte mode: upper six bits and the following three bytes are the LE encoding of the value. Valid only for values `(2^14)` through `(2^30 - 1)`.	Big-integer mode: The upper six bits are the number of bytes following, plus four. The value is contained, LE encoded, in the bytes following. The final (most significant) byte must be non-zero. Valid only for values `(2^30)` through `(2^536 - 1)`.

Compact/general integers are encoded with the two least significant bits denoting the mode.

Compact Integers: 0

use parity_scale_codec::{Encode, HasCompact};

#[derive(Encode)]
struct AsCompact<T: HasCompact>(#[codec(compact)] T);

fn main() {
	println!("{:02x?}", 0u8.encode());
	println!("{:02x?}", 0u32.encode());
	println!("{:02x?}", AsCompact(0u8).encode());
	println!("{:02x?}", AsCompact(0u32).encode());
}

[00]
[00, 00, 00, 00]
[00]
[00]

Compact Integers: 42

use parity_scale_codec::{Encode, HasCompact};

#[derive(Encode)]
struct AsCompact<T: HasCompact>(#[codec(compact)] T);

fn main() {
	println!("{:02x?}", 42u8.encode());
	println!("{:02x?}", 42u32.encode());
	println!("{:02x?}", AsCompact(42u8).encode());
	println!("{:02x?}", AsCompact(42u32).encode());
}

[2a]
[2a, 00, 00, 00]
[a8]
[a8]

42 as binary: 0b101010 = [0x2a].
Add 00 to the least significant bits.
0b10101000 = [0xa8] = 168 as decimal.

Compact Integers: 69

use parity_scale_codec::{Encode, HasCompact};

#[derive(Encode)]
struct AsCompact<T: HasCompact>(#[codec(compact)] T);

fn main() {
	println!("{:02x?}", 69u8.encode());
	println!("{:02x?}", 69u32.encode());
	println!("{:02x?}", AsCompact(69u8).encode());
	println!("{:02x?}", AsCompact(69u32).encode());
}

[45]
[45, 00, 00, 00]
[15, 01]
[15, 01]

69 as binary: 0b1000101 = [0x45].
Add 01 to the least significant bits.
0b100010101 = [0x15, 0x01] = 277 as decimal.

Compact Integers: 65535 (u16::MAX)

use parity_scale_codec::{Encode, HasCompact};

#[derive(Encode)]
struct AsCompact<T: HasCompact>(#[codec(compact)] T);

fn main() {
	println!("{:02x?}", 65535u16.encode());
	println!("{:02x?}", 65535u32.encode());
	println!("{:02x?}", AsCompact(65535u16).encode());
	println!("{:02x?}", AsCompact(65535u32).encode());
}

[ff, ff]
[ff, ff, 00, 00]
[fe, ff, 03, 00]
[fe, ff, 03, 00]

65535 as binary: 0b1111111111111111 = [0xff, 0xff].
Add 10 to the least significant bits.
0b111111111111111110 = [0xfe, 0xff, 0x03, 0x00]: 262142 as decimal.

Compact Integers Are "Backwards Compatible"

As you can see, you are able to "upgrade" a type without affecting the encoding.

Enum

Prefix with index (u8), then the value, if any.

use parity_scale_codec::Encode;

#[derive(Encode)]
enum Example {
	First,
	Second(u8),
	Third(Vec<u8>),
	Fourth,
}

fn main() {
	println!("{:02x?}", Example::First.encode());
	println!("{:02x?}", Example::Second(2).encode());
	println!("{:02x?}", Example::Third(vec![0, 1, 2, 3, 4]).encode());
	println!("{:02x?}", Example::Fourth.encode());
}

[00]
[01, 02]
[02, 14, 00, 01, 02, 03, 04]
[03]

Tuple and Struct

Just encode and concatenate the items.

use parity_scale_codec::Encode;

#[derive(Encode)]
struct Example {
	number: u8,
	is_cool: bool,
	optional: Option<u32>,
}

fn main() {
	let my_struct = Example {
		number: 0,
		is_cool: true,
		optional: Some(69),
	};
	println!("{:02x?}", (0u8, true, Some(69u32)).encode());
	println!("{:02x?}", my_struct.encode());
}

[00, 01, 01, 45, 00, 00, 00]
[00, 01, 01, 45, 00, 00, 00]

Notes:

Note that tuple and struct encode the same, even though struct has named fields.

Embedded Compact

use parity_scale_codec::Encode;

#[derive(Encode)]
struct Example {
	number: u64,
	#[codec(compact)]
	compact_number: u64,
}

#[derive(Encode)]
enum Choices {
	One(u64, #[codec(compact)] u64),
}

fn main() {
	let my_struct = Example { number: 42, compact_number: 1337 };
	let my_choice = Choices::One(42, 1337);
	println!("{:02x?}", my_struct.encode());
	println!("{:02x?}", my_choice.encode());
}

[2a, 00, 00, 00, 00, 00, 00, 00, e5, 14]
[00, 2a, 00, 00, 00, 00, 00, 00, 00, e5, 14]

Unit, Bool, Option, and Result

use parity_scale_codec::Encode;

fn main() {
	println!("{:02x?}", ().encode());
	println!("{:02x?}", true.encode());
	println!("{:02x?}", false.encode());
	println!("{:02x?}", Ok::<u32, ()>(42u32).encode());
	println!("{:02x?}", Err::<u32, ()>(()).encode());
	println!("{:02x?}", Some(42u32).encode());
	println!("{:02x?}", None::<u32>.encode());
}

[]
[01]
[00]
[00, 2a, 00, 00, 00]
[01]
[01, 2a, 00, 00, 00]
[00]

Arrays, Vectors, and Strings

Arrays: Just concatenate the items.
Vectors: Also prefix with length (compact encoded).
String: Just Vec<u8> as utf-8 characters.

use parity_scale_codec::Encode;

fn main() {
	println!("{:02x?}", [0u8, 1u8, 2u8, 3u8, 4u8].encode());
	println!("{:02x?}", vec![0u8, 1u8, 2u8, 3u8, 4u8].encode());
	println!("{:02x?}", "hello".encode());
	println!("{:02x?}", vec![0u8; 1024].encode());
}

[00, 01, 02, 03, 04]
[14, 00, 01, 02, 03, 04]
[14, 68, 65, 6c, 6c, 6f]
[01, 10, 00, 00, ... snip ... , 00]

Notes:

Note that the length prefix can be multiple bytes, like the last example.

Decoding

We can similarly take raw bytes, and decode it into a well known type.

Metadata can be used to convey to a program how to decode a type properly...

But bad or no information means the proper format for the data cannot be known.

Decoding Examples

use parity_scale_codec::{ Encode, Decode, DecodeAll };

fn main() {
	let array = [0u8, 1u8, 2u8, 3u8];
	let value: u32 = 50462976;

	println!("{:02x?}", array.encode());
	println!("{:02x?}", value.encode());
	println!("{:?}", u32::decode(&mut &array.encode()[..]));
	println!("{:?}", u16::decode(&mut &array.encode()[..]));
	println!("{:?}", u16::decode_all(&mut &array.encode()[..]));
	println!("{:?}", u64::decode(&mut &array.encode()[..]));
}

[00, 01, 02, 03]
[00, 01, 02, 03]
Ok(50462976)
Ok(256)
Err(Error { cause: None, desc: "Input buffer has still data left after decoding!" })
Err(Error { cause: None, desc: "Not enough data to fill buffer" })

Notes:

Decoding can fail
Values can decode badly

Decode Limits

Decoding isn't free!
The more complex the decode type, the more computation that will be used to decode the value.
Generally you always want to decode_with_depth_limit.
Substrate uses a limit of 256.

Decode Bomb

Here is an example of a decode bomb.

use parity_scale_codec::{ Encode, Decode, DecodeLimit };

#[derive(Encode, Decode, Debug)]
enum Example {
	First,
	Second(Box<Self>),
}

fn main() {
	let bytes = vec![1, 1, 1, 1, 1, 0];
	println!("{:?}", Example::decode(&mut &bytes[..]));
	println!("{:?}", Example::decode_with_depth_limit(10, &mut &bytes[..]));
	println!("{:?}", Example::decode_with_depth_limit(3, &mut &bytes[..]));
}

Ok(Second(Second(Second(Second(Second(First))))))
Ok(Second(Second(Second(Second(Second(First))))))
Err(Error { cause: Some(Error { cause: Some(Error { cause: Some(Error { cause: Some(Error { cause: None, desc: "Maximum recursion depth reached when decoding" }), desc: "Could not decode `Example::Second.0`" }), desc: "Could not decode `Example::Second.0`" }), desc: "Could not decode `Example::Second.0`" }), desc: "Could not decode `Example::Second.0`" })

Exceptions: BTreeSet

BTreeSet will decode from an unordered set, but will also order them as a result.

Be careful... this one isn't bijective.

use parity_scale_codec::{ Encode, Decode, alloc::collections::BTreeSet };

fn main() {
	let vector = vec![4u8, 3u8, 2u8, 1u8, 0u8];
	let vector_encoded = vector.encode();
	let btree = BTreeSet::<u8>::decode(&mut &vector_encoded[..]).unwrap();
	let btree_encoded = btree.encode();

	println!("{:02x?}", vector_encoded);
	println!("{:02x?}", btree_encoded);
}

[14, 04, 03, 02, 01, 00]
[14, 00, 01, 02, 03, 04]

Optimizations and Tricks

DecodeLength: Read the length of a collection (like Vec) without decoding everything.
EncodeAppend: Append an item without decoding all the other items. (like Vec)

Implementations

SCALE Codec has been implemented in other languages, including:

Python: polkascan/py-scale-codec
Golang: itering/scale.go
C: MatthewDarnell/cScale
C++: soramitsu/scale-codec-cpp
JavaScript: polkadot-js/api
TypeScript: scale-ts
AssemblyScript: LimeChain/as-scale-codec
Haskell: airalab/hs-web3
Java: emeraldpay/polkaj
Ruby: wuminzhe/scale_rb

Check speaker notes (click "s" 😉)

Notes:

Polkadot Blockchain Academy

SCALE Codec

SCALE Codec

SCALE Codec

SCALE

Little-Endian

Why SCALE? Why not X?

SCALE is NOT Self-Descriptive

Example: SCALE vs JSON

Try It Yourself!

Little vs Big Endian Output

Fixed Width Integers

Compact Integers

Compact Prefix

Compact Integers: 0

Compact Integers: 42

Compact Integers: 69

Compact Integers: 65535 (u16::MAX)

Compact Integers Are "Backwards Compatible"

Enum

Tuple and Struct

Embedded Compact

Unit, Bool, Option, and Result

Arrays, Vectors, and Strings

Decoding

Decoding Examples

Decode Limits

Decode Bomb

Exceptions: BTreeSet

Optimizations and Tricks

Implementations

Missing Some Metadata?

Remember, in the end of the day, everything is just 0's and 1's.

Additional Resources! 😋