SCALE Codec


How to use the slides - Full screen (new tab)
Slides Content
--- title: SCALE Codec description: SCALE Codec for web3 builders duration: 1 hour ---

SCALE Codec


SCALE Codec

At the end of this lecture, you will learn why Substrate uses SCALE codec, how all different kinds of data types are encoded.


SCALE

Simple Concatenated Aggregate Little-Endian

SCALE is a light-weight format which allows encoding (and decoding) which makes it highly suitable for resource-constrained execution environments like blockchain runtimes and low-power, low-memory devices.


Little-Endian

Little endian systems store the least significant byte at the smallest memory address.

Wasm is a little endian system, which makes SCALE very performant.


Why SCALE? Why not X?

  • Simple to define.
  • Not Rust-specific (but happens to work great in Rust).
    • Easy to derive codec logic: #[derive(Encode, Decode)]
    • Viable and useful for APIs like: MaxEncodedLen and TypeInfo
    • It does not use Rust std, and thus can compile to Wasm no_std.
  • Consensus critical / bijective; one value will always encode to one blob and that blob will only decode to that value.
  • Supports a copy-free decode for basic types on LE architectures.
  • It is about as thin and lightweight as can be.

SCALE is NOT Self-Descriptive

It is important to note that the encoding context (knowledge of how the types and data structures look) needs to be known separately at both encoding and decoding ends.

The encoded data does not include this contextual information.


Example: SCALE vs JSON

use parity_scale_codec::{ Encode };

#[derive(Encode)]
struct Example {
	number: u8,
	is_cool: bool,
	optional: Option<u32>,
}

fn main() {
	let my_struct = Example {
		number: 42,
		is_cool: true,
		optional: Some(69),
	};
	println!("{:?}", my_struct.encode());
	println!("{:?}", my_struct.encode().len());
}
[42, 1, 1, 69, 0, 0, 0]
7
use serde::{ Serialize };

#[derive(Serialize)]
struct Example {
	number: u8,
	is_cool: bool,
	optional: Option<u32>,
}

fn main() {
	let my_struct = Example {
		number: 42,
		is_cool: true,
		optional: Some(69),
	};
	println!("{:?}", serde_json::to_string(&my_struct).unwrap());
	println!("{:?}", serde_json::to_string(&my_struct).unwrap().len());
}
"{\"number\":42,\"is_cool\":true,\"optional\":69}"
42

Try It Yourself!

mkdir temp
cd temp
cargo init
cargo add parity-scale-codec --features derive

Little vs Big Endian Output

It can be confusing to read the output, and keep in mind endianness.

The order of bytes in the vector follow endianness, but the hex and binary representation of each byte is the same, and independent of endianness.

0b prefix denotes a binary representation, and 0x denotes a hex representation.

fn main() {
	println!("{:b}", 69i8);
	println!("{:02x?}", 69i8.to_le_bytes());
	println!("{:02x?}", 69i8.to_be_bytes());
	println!("{:b}", 42u16);
	println!("{:02x?}", 42u16.to_le_bytes());
	println!("{:02x?}", 42u16.to_be_bytes());
	println!("{:b}", 16777215u32);
	println!("{:02x?}", 16777215u32.to_le_bytes());
	println!("{:02x?}", 16777215u32.to_be_bytes());
}
1000101
[45]
[45]
101010
[2a, 00]
[00, 2a]
111111111111111111111111
[ff, ff, ff, 00]
[00, ff, ff, ff]

Fixed Width Integers

Basic integers are encoded using a fixed-width little-endian (LE) format.

use parity_scale_codec::Encode;

fn main() {
	println!("{:02x?}", 69i8.encode());
	println!("{:02x?}", 69u8.encode());
	println!("{:02x?}", 42u16.encode());
	println!("{:02x?}", 16777215u32.encode());
}
[45]
[45]
[2a, 00]
[ff, ff, ff, 00]

Notes:

notice the first two being the same. SCALE IS NOT DESCRIPTIVE of the type. The decoder is responsible for decoding this into some 1 byte-width type, be it u8 or i8 or something else.


Compact Integers

A "compact" or general integer encoding is sufficient for encoding large integers (up to 2536) and is more efficient at encoding most values than the fixed-width version.

Though for single-byte values, the fixed-width integer is never worse.


Compact Prefix

0b000b010b100b11
single-byte mode; upper six bits are the LE encoding of the value. Valid only for values of 0 through 63.two-byte mode: upper six bits and the following byte is the LE encoding of the value. Valid only for values 64 through (2^14 - 1).four-byte mode: upper six bits and the following three bytes are the LE encoding of the value. Valid only for values (2^14) through (2^30 - 1).Big-integer mode: The upper six bits are the number of bytes following, plus four. The value is contained, LE encoded, in the bytes following. The final (most significant) byte must be non-zero. Valid only for values (2^30) through (2^536 - 1).

Compact/general integers are encoded with the two least significant bits denoting the mode.


Compact Integers: 0

use parity_scale_codec::{Encode, HasCompact};

#[derive(Encode)]
struct AsCompact<T: HasCompact>(#[codec(compact)] T);

fn main() {
	println!("{:02x?}", 0u8.encode());
	println!("{:02x?}", 0u32.encode());
	println!("{:02x?}", AsCompact(0u8).encode());
	println!("{:02x?}", AsCompact(0u32).encode());
}
[00]
[00, 00, 00, 00]
[00]
[00]

Compact Integers: 42

use parity_scale_codec::{Encode, HasCompact};

#[derive(Encode)]
struct AsCompact<T: HasCompact>(#[codec(compact)] T);

fn main() {
	println!("{:02x?}", 42u8.encode());
	println!("{:02x?}", 42u32.encode());
	println!("{:02x?}", AsCompact(42u8).encode());
	println!("{:02x?}", AsCompact(42u32).encode());
}
[2a]
[2a, 00, 00, 00]
[a8]
[a8]
  • 42 as binary: 0b101010 = [0x2a].
  • Add 00 to the least significant bits.
  • 0b10101000 = [0xa8] = 168 as decimal.

Compact Integers: 69

use parity_scale_codec::{Encode, HasCompact};

#[derive(Encode)]
struct AsCompact<T: HasCompact>(#[codec(compact)] T);

fn main() {
	println!("{:02x?}", 69u8.encode());
	println!("{:02x?}", 69u32.encode());
	println!("{:02x?}", AsCompact(69u8).encode());
	println!("{:02x?}", AsCompact(69u32).encode());
}
[45]
[45, 00, 00, 00]
[15, 01]
[15, 01]
  • 69 as binary: 0b1000101 = [0x45].
  • Add 01 to the least significant bits.
  • 0b100010101 = [0x15, 0x01] = 277 as decimal.

Compact Integers: 65535 (u16::MAX)

use parity_scale_codec::{Encode, HasCompact};

#[derive(Encode)]
struct AsCompact<T: HasCompact>(#[codec(compact)] T);

fn main() {
	println!("{:02x?}", 65535u16.encode());
	println!("{:02x?}", 65535u32.encode());
	println!("{:02x?}", AsCompact(65535u16).encode());
	println!("{:02x?}", AsCompact(65535u32).encode());
}
[ff, ff]
[ff, ff, 00, 00]
[fe, ff, 03, 00]
[fe, ff, 03, 00]
  • 65535 as binary: 0b1111111111111111 = [0xff, 0xff].
  • Add 10 to the least significant bits.
  • 0b111111111111111110 = [0xfe, 0xff, 0x03, 0x00]: 262142 as decimal.

Compact Integers Are "Backwards Compatible"

As you can see, you are able to "upgrade" a type without affecting the encoding.


Enum

Prefix with index (u8), then the value, if any.

use parity_scale_codec::Encode;

#[derive(Encode)]
enum Example {
	First,
	Second(u8),
	Third(Vec<u8>),
	Fourth,
}

fn main() {
	println!("{:02x?}", Example::First.encode());
	println!("{:02x?}", Example::Second(2).encode());
	println!("{:02x?}", Example::Third(vec![0, 1, 2, 3, 4]).encode());
	println!("{:02x?}", Example::Fourth.encode());
}
[00]
[01, 02]
[02, 14, 00, 01, 02, 03, 04]
[03]

Tuple and Struct

Just encode and concatenate the items.

use parity_scale_codec::Encode;

#[derive(Encode)]
struct Example {
	number: u8,
	is_cool: bool,
	optional: Option<u32>,
}

fn main() {
	let my_struct = Example {
		number: 0,
		is_cool: true,
		optional: Some(69),
	};
	println!("{:02x?}", (0u8, true, Some(69u32)).encode());
	println!("{:02x?}", my_struct.encode());
}
[00, 01, 01, 45, 00, 00, 00]
[00, 01, 01, 45, 00, 00, 00]

Notes:

Note that tuple and struct encode the same, even though struct has named fields.


Embedded Compact

use parity_scale_codec::Encode;

#[derive(Encode)]
struct Example {
	number: u64,
	#[codec(compact)]
	compact_number: u64,
}

#[derive(Encode)]
enum Choices {
	One(u64, #[codec(compact)] u64),
}

fn main() {
	let my_struct = Example { number: 42, compact_number: 1337 };
	let my_choice = Choices::One(42, 1337);
	println!("{:02x?}", my_struct.encode());
	println!("{:02x?}", my_choice.encode());
}
[2a, 00, 00, 00, 00, 00, 00, 00, e5, 14]
[00, 2a, 00, 00, 00, 00, 00, 00, 00, e5, 14]

Unit, Bool, Option, and Result

use parity_scale_codec::Encode;

fn main() {
	println!("{:02x?}", ().encode());
	println!("{:02x?}", true.encode());
	println!("{:02x?}", false.encode());
	println!("{:02x?}", Ok::<u32, ()>(42u32).encode());
	println!("{:02x?}", Err::<u32, ()>(()).encode());
	println!("{:02x?}", Some(42u32).encode());
	println!("{:02x?}", None::<u32>.encode());
}
[]
[01]
[00]
[00, 2a, 00, 00, 00]
[01]
[01, 2a, 00, 00, 00]
[00]

Arrays, Vectors, and Strings

  • Arrays: Just concatenate the items.
  • Vectors: Also prefix with length (compact encoded).
  • String: Just Vec<u8> as utf-8 characters.
use parity_scale_codec::Encode;

fn main() {
	println!("{:02x?}", [0u8, 1u8, 2u8, 3u8, 4u8].encode());
	println!("{:02x?}", vec![0u8, 1u8, 2u8, 3u8, 4u8].encode());
	println!("{:02x?}", "hello".encode());
	println!("{:02x?}", vec![0u8; 1024].encode());
}
[00, 01, 02, 03, 04]
[14, 00, 01, 02, 03, 04]
[14, 68, 65, 6c, 6c, 6f]
[01, 10, 00, 00, ... snip ... , 00]

Notes:

Note that the length prefix can be multiple bytes, like the last example.


Decoding

We can similarly take raw bytes, and decode it into a well known type.

Metadata can be used to convey to a program how to decode a type properly...

But bad or no information means the proper format for the data cannot be known.


Decoding Examples

use parity_scale_codec::{ Encode, Decode, DecodeAll };

fn main() {
	let array = [0u8, 1u8, 2u8, 3u8];
	let value: u32 = 50462976;

	println!("{:02x?}", array.encode());
	println!("{:02x?}", value.encode());
	println!("{:?}", u32::decode(&mut &array.encode()[..]));
	println!("{:?}", u16::decode(&mut &array.encode()[..]));
	println!("{:?}", u16::decode_all(&mut &array.encode()[..]));
	println!("{:?}", u64::decode(&mut &array.encode()[..]));
}
[00, 01, 02, 03]
[00, 01, 02, 03]
Ok(50462976)
Ok(256)
Err(Error { cause: None, desc: "Input buffer has still data left after decoding!" })
Err(Error { cause: None, desc: "Not enough data to fill buffer" })

Notes:

  • Decoding can fail
  • Values can decode badly

Decode Limits

  • Decoding isn't free!
  • The more complex the decode type, the more computation that will be used to decode the value.
  • Generally you always want to decode_with_depth_limit.
  • Substrate uses a limit of 256.

Decode Bomb

Here is an example of a decode bomb.

use parity_scale_codec::{ Encode, Decode, DecodeLimit };

#[derive(Encode, Decode, Debug)]
enum Example {
	First,
	Second(Box<Self>),
}

fn main() {
	let bytes = vec![1, 1, 1, 1, 1, 0];
	println!("{:?}", Example::decode(&mut &bytes[..]));
	println!("{:?}", Example::decode_with_depth_limit(10, &mut &bytes[..]));
	println!("{:?}", Example::decode_with_depth_limit(3, &mut &bytes[..]));
}
Ok(Second(Second(Second(Second(Second(First))))))
Ok(Second(Second(Second(Second(Second(First))))))
Err(Error { cause: Some(Error { cause: Some(Error { cause: Some(Error { cause: Some(Error { cause: None, desc: "Maximum recursion depth reached when decoding" }), desc: "Could not decode `Example::Second.0`" }), desc: "Could not decode `Example::Second.0`" }), desc: "Could not decode `Example::Second.0`" }), desc: "Could not decode `Example::Second.0`" })

Exceptions: BTreeSet

BTreeSet will decode from an unordered set, but will also order them as a result.

Be careful... this one isn't bijective.

use parity_scale_codec::{ Encode, Decode, alloc::collections::BTreeSet };

fn main() {
	let vector = vec![4u8, 3u8, 2u8, 1u8, 0u8];
	let vector_encoded = vector.encode();
	let btree = BTreeSet::<u8>::decode(&mut &vector_encoded[..]).unwrap();
	let btree_encoded = btree.encode();

	println!("{:02x?}", vector_encoded);
	println!("{:02x?}", btree_encoded);
}
[14, 04, 03, 02, 01, 00]
[14, 00, 01, 02, 03, 04]

Optimizations and Tricks

  • DecodeLength: Read the length of a collection (like Vec) without decoding everything.

  • EncodeAppend: Append an item without decoding all the other items. (like Vec)


Implementations

SCALE Codec has been implemented in other languages, including:


Missing Some Metadata?

To make SCALE useful as an encoding format within the Substrate and Polkadot ecosystem, we need to figure out a way to provide metadata about all the types we will expect, and when we will expect them.

HINT: We do.


Remember, in the end of the day, everything is just 0's and 1's.


Additional Resources! 😋

Check speaker notes (click "s" 😉)

Notes: